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Abstract. This paper is devoted to establish quantitative and qualitative estimates 
related to the notion of chaos as firstly formulated by M. Kac [41] in his study of mean- 
field limit for systems of N undistinguishable particles as N — > oo. 

First, we quantitatively liken three usual measures of Kac's chaos, some involving the 
all N variables, other involving a finite fixed number of variables. The cornerstone of the 
proof is a new representation of the Monge-Kantorovich-Wasserstein (MKW) distance 
for symmetric iV-particle probability measures in terms of the distance between the law 
of the associated empirical measures on the one hand, and a new estimate on some MKW 
distance on probability measures spaces endowed with a suitable Hilbert norm taking 
advantage of the associated good algebraic structure. 

Next, we define the notion of entropy chaos and Fisher information chaos in a similar 
way as defined by Carlen et al [17]. We show that Fisher information chaos is stronger 
than entropy chaos, which in turn is stronger than Kac's chaos. More importantly, 
with the help of the HWI inequality of Otto-Villani, we establish a quantitative esti- 
mate between these quantities, which in particular asserts that Kac's chaos plus Fisher- 
information bound implies entropy chaos. 

We then extend the above quantitative and qualitative results about chaos in the 
framework of probability measures with support on the Kac's spheres, revisiting [17] and 
giving a possible answer to [17, Open problem 11]. Additionally to the above mentioned 
tool, we use and prove an optimal rate local CLT in L°° norm for distributions with 
finite 6-th moment and finite LP norm, for some p > 1. 

Last, we investigate how our techniques can be used without assuming chaos, in the 
context of probability measures mixtures introduced by De Finetti, Hewitt and Savage. 
In particular, we define the (level 3) Fisher information for mixtures and prove that it is 
l.s.c. and affine, as that was done in [64] for the level 3 Boltzmann's entropy. 
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1. Introduction and main results 

The Kac's notion of chaos rigorously formalizes the intuitive idea for a family of stochas- 
tic valued vectors with iV coordinates to have asymptotically independent coordinates as 
N goes to infinity. We refer to [67] for an introduction to that topics from a probabilistic 
point of view, as well as to [54] for a recent and short survey. 

Definition 1.1. [41, section 3] Consider E C M d , / S P{E) a probability measure on 
E and G N E P syrn (E N ) a sequence of probability measures on E N , N > 1, which are 
invariant under coordinates permutations. We say that (G N ) is f -Kac's chaotic (or has 
the "Boltzmann property") if 

(1.1) V j > 1, Gf ->> f® j weakly in P(E j ) as N -> oo, 
where Gf stands for the j-th marginal of G N defined by 

Gf := [ G N dx j+1 ...dx N . 

Je n -3 

Interacting iV-indistinguishable particle systems are naturally described by exchange- 
able random variables (which corresponds to the fact that their associated probability laws 
are symmetric, i.e. invariant under coordinates permutations) but they are not described 
by random variables with independent coordinates (which corresponds to the fact that 
their associated probability laws are tensor products) except for situations with no inter- 
action! Kac's chaos is therefore a well adapted concept to formulate and investigate the 
infinite number of particles limit N — > oo for these systems as it has been illustrated by 
many works since the seminal article by Kac [41]. Using the above definition of chaos, it 
is shown in [41, 49, 50, 35, 55] that if f(t) evolves according to the nonlinear space homo- 
geneous Boltzmann equation, G N (t) evolves according to the linear Master /Kolmogorov 
equation associated to the stochastic Kac-Boltzmann jumps (collisions) process and G N (0) 
is /(O)-chaotic, then for any later time t > the sequence G N (t) is also /(i)-chaotic: in 
other words propagation of chaos holds for that model. As it is explained in the latest 
reference and using the uniqueness of statistical solutions proved in [2] , some of these prop- 
agation of chaos results can be seen as an illustration of the "BBGKY hierarchy method" 
whose most famous success is the Lanford's proof of the "Boltzmann-Grad limit" [43]. 

In order to investigate quantitative version of Kac's chaos, the above weak convergence 
in (1.1) can be formulated in terms of the Monge-Kantorovich-Wasserstein (MKW) trans- 
portation distance between Gf and /® J '. More precisely, given dE a bounded distance on 
E, we define the normalized distance d E j on E 3 , j G N*, by setting 

1 j 

(1.2) VX = On, ..., Xj ),Y = (y u •..,%) G & d Ei (X, Y) := - d E (x hyi ), 
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and then we define W\ (without specifying the dependence on j) the associated MKW 
distance in P(E J ) (see the definition (2.2) below). With the notations of Definition 1.1, 
G N is /-Kac's chaotic if, and only if, 

V j > 1, Slj(G N ; f) := WtiGf, /»>') as N -> oo. 

Let us introduce now another formulation of Kac's chaos which we firstly formulate in a 
probabilistic language. For any X = (x±, ...,xjv) G ^ , we define the associated empirical 
measure 

1 - 

(1-3) ^(dy):=-^28 Xl (dy)GP(E). 

8=1 

We say that an exchangeable .E^-valued random vector X is /-chaotic if the associated 
P(£')-valued random variable fi^ N converges to the deterministic random variable / in 
law in P{E): 

(1.4) /i^N =>• / in law as N — > oo. 

In the framework of Definition 1.1, the convergence (1.4) can be equivalently formulated 
in the following way. Introducing G N := J£(X ) the law of X N , the exchangeability 
hypothesis means that G N € P sym (E N ). Next the law G N := JC(/j,^ N ) of fi^ N is nothing 
but the (unique) measure G N £ P(P(E)) such that 

(G N ,<S>)=[ $(^)G N (dX) V$eC t (P(£)), 

JE N 

or equivalently the push-forward of G N by the "empirical distribution" application. 
Then the convergence (1.4) just means that 

(1.5) G N -> S f weakly in P(P(^)) as N -> oo, 

where this definition does not refer anymore to the random variables or /ujwy. It is 
well known (see for instance [36, section 4], [41, 69, 66] and [67, Proposition 2.2]) that for 
a sequence (G N ) of P sym (E N ) and a probability measure / E P(E) the three following 
assertions are equivalent: 

(i) convergence (1.1) holds for any j > 1; 

(ii) convergence (1.1) holds for some j > 2; 

(iii) convergence (1.5) holds; 

so that in particular (1.1) and (1.5) are indeed equivalent formulations of Kac's chaos. 
The chaos formulation (ii) has been used since [41], while the chaos formulation (iii) is 
widely used in the works by Sznitman [65], see also [66, 52, 59], where the chaos property is 
established by proving that the "empirical process" /i^n converges to a limit process with 
values in P(E) which is a solution to a nonlinear martingale problem associated to the 
mean-field limit equation. Formulation (1.5) is also well adapted for proving quantitative 
propagation of chaos for deterministic dynamics associated to the Vlasov equation with 
regular interaction force [27] as well as singular interaction force [39, 38]. Let us briefly 
explain this point now, see also [54, section 1.1]. On the one hand, introducing the MKW 
transport distance Wi := V\?Wi on P(P(E)) based on the MKW distance W\ on P(E), 
(see definition (2.6) below), the weak convergence (1.5) is nothing but the fact that 

n oo (G N ;f):=W 1 (G N ,5 f )^0 as N -»• oo. 
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On the other hand, for the Vlasov equation with smooth and bounded force term, it is 
proved in [27] that 

(1.6) VT > 0, Vie [0,T] Wtifi^Jt) < CtWi(A,/ ), 

where ft G P(-E-) is the solution to the Vlasov equation with initial datum /o and X^ G E N 
is the solution to the associated system of ODEs with initial datum Xq . Inequality (1.6) 
is a consequence of the fact that t 1— > ^ N solves the Vlasov equation and that a local W\ 
stability result holds for such an equation. When Xq is distributed according to an initial 
density Gq G P sym (E N ) we may show that X t is distributed according to G P sym (E N ) 
obtained as the transported measure along the flow associated to the above mentioned 
system of ODEs or equivalently G^ is the solution to the associated Liouville equation 
with initial condition Gq . Taking the expectation in both sides of (1.6), we get 

/ W^4j t )G^{dY) = E[WiO#*,/ t )] 
Je n * 

< C t E[W 1 ($nJq)]=C t f W!($,f )G$(dY), 
Je n 

for any t G [0, T\. We conclude with the following quantitative chaos propagation estimate 

Vte[0,T\ fioo(Gf;/t) <QrSlao(G$;f ). 

It is worth mentioning that partially inspired from [36], it is shown in [57, 55] a similar 
inequality as above for more general models including drift, diffusion and collisional in- 
teractions where however the estimate may mix several chaos quantification quantities as 
f^oo and f^2 for instance. 

There exists at least one more way to guaranty chaoticity which is very popular because 
that chaos formulation naturally appears in the probabilistic coupling technique, see [67], 
as well as [47, 12, 11] and the references therein. 

Thanks to the coupling techniques we typically may show that an exchangeable E N - 
valued random vector X satisfies 

1 N 

i=l 

for some .E^-valued random vector y N with independent coordinates. Denoting by G N G 
P sym (E N ) the law of X N , f the law of one coordinate 3^- , and W\ the MKW transport 
distance on P(E N ) based on the normalized distance d E N in E N defined by (1.2), the 
above convergence readily implies 

(1.7) n N (G N ;f) -^(G^H ^0 as N -> oo, 

which in turn guaranties that (G N ) is /-chaotic. It is generally agreed that the convergence 
(1.7) is a strong version of chaos, maybe because it involves the all N variables, while the 
Kac's original definition only involves a finite fixed number of variables. 

Summary of Section 2. The first natural question we consider is about the equivalence 
between these definitions of chaos, and more precisely the possibility to liken them in a 
quantitative way. The following result gives a positive answer, we also refer to Theorem 2.4 
in section 2 for a more accurate statement. 

Theorem 1.2 (Equivalence of measure for Kac's chaos). For any moment order k > and 
any positive exponent 7 < (d + 1 + d/k)~ l , there exists a constant C = C(d, k, 7) G (0, 00) 



CHAOS 



5 



such that for any f G P(E), any G N G P sym {E N ), N > 1, and anyj,£ G {1, iV}U{oo}, 
£ 1, there holds 

Qj(G N ; f) < C^ k (n e {G N ;f) + 1)\ 

where = M^(/) + M^(Gi) is the sum of the moments of order k of f and Gi . 

It is worth emphasizing that the above inequality is definitively false in general for 
I = 1. The first outcome of our theorem is that it shows that, regardless of the rate, 
the propagation of chaos results obtained by the coupling method is of the same nature 
as the propagation of chaos result obtained by the "BBGKY hierarchy method" and the 
"empirical measures method". 

The proof of Theorem 2.4 (from which Theorem 1.2 follows) will be presented in sec- 
tion 2. Let us briefly explain the strategy. First, the fact that we may control Slj by Clg for 
1 < j < ^ < is classical and quite easy. Next, we will establish an estimate of f^oo by 0,2 
following an idea introduced in [55]: we begin to prove a similar estimate where we replace 
Qoc by the MKW distance in P(P(E)) associated to the H- s (R d ) norm, s > (d+ l)/2, on 
P(E) in order to take advantage of the good algebraic structure of that Hilbert norm and 
then we come back to thanks to the "uniform topological equivalence" of metrics in 
P(-E) and the Holder inequality. Finally, and that is the other key new result, we compare 
r^oo and ^at: that is direct consequence of the following identity 

\/F N ,G N G P sym (E N ) W 1 (G N ,F N ) = Wi(G N ,F N ) 

applied to F N := f® N and a functional version of the law of large numbers. 

Summary of section 3. A somewhat stronger notion of chaos can be formulated in terms 
of entropy functionals. Such a notion has been explicitly introduced by Carlen, Carvahlo, 
Le Roux, Loss, Villani in [17] (in the context of probability measures with support on the 
"Kac's spheres") but it is reminiscent of the works [42, 6]. We also refer to [64, 53, 13, 14] 
where the TV" particles entropy functional below is widely used in order to identify the 
possible limits for a system of iV particles as N — > oo. Consider E C M. d an open set 
or the adherence of a open space, in order that the gradient of a function may be well 
defined. For a (smooth and/or decaying enough) probability measure G N G Psym 

(E ) we 

define (see section 3 for the suitable definitions) the Boltzmann's entropy and the Fisher 
information by 

It is worth emphasizing that contrarily to the most usual convention, adopted for instance 
in [17, Definition 8], we have put the normalized factor 1/N in the definitions of the entropy 
and the Fisher information. Moreover we use the same notation for these functionals 
whatever is the dimension. As a consequence, we have H(f® N ) = H(f) and I(f® N ) = 
1(f) for any probability measures / G P(E). 

Definition 1.3. Consider (G N ) a sequence of P sym (E N ) such that for k > the k-th 
moment M^G^) is uniformly bounded in N, and f G P(E). We say that 

(a) (G ) is f -entropy chaotic (or f -chaotic in the sense of the Boltzmann's entropy) if 

Gf ^ / weakly in P(E) and H(G N ) -»• H(f), H(f) < oo; 
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(b) (G ) is f -Fisher information chaotic (or f -chaotic in the sense of the Fisher in- 
formation) if 

Gi -± f weakly in P(E) and I(G N ) ->• 1(f), 1(f) < oo. 

Our second main result is the following qualitative comparison of the three above notions 
of chaos convergence. 

Theorem 1.4. Assume E = R d , d > 1, or E is a bi-Lipschitz volume preserving defor- 
mation of a convex set ofR d , d > 1. Consider (G ) a sequence of P syrn (E N ) such that 
the k-th moment Mk(G^) is bounded, k > 2, and f 6 P(E). 
In the list of assertions below, each one implies the assertion which follows: 

(i) (G N ) is f -Fisher information chaotic; 

(ii) (G ) is f-Kac's chaotic and I(G N ) is bounded; 
(Hi) (G ) is f -entropy chaotic; 

(iv) (G N ) is f-Kac's chaotic. 
More precisely, the following quantitative estimate of the implication (ii) (in) holds: 

(1.8) \H(G N ) - H(f)\ < c E Kn N (G N -jy, 

with 7 := 1/2 - K := sup N I(G N y/ 2 sup N Mk(G^) l / k andCE is a constant depend- 
ing on the set E (one can choose Ce = 8 when E = M. d ). 

The implication (ii) => (Hi) is the most interesting part and hardest step in the proof 
of Theorem 1.4. It is based on estimate (1.8) which is a mere consequence of the HWI 
inequality of Otto and Villani proved in [61] when E = M. d 

together with our equivalence of chaos convergences previously established. It is also 
the most restrictive one in term of moment bound: the implication (ii) =>■ (Hi) requires 
a A;-th moment bound of order k > 2 while the other implications only require k-th 
moment bound of order k > or no moment bound condition (we refer to the proof 
of Theorem 1.4 in section 3 for details). The proofs of the implications (i) (ii) and 
(Hi) =>• (iv) use the fact that the subadditivity inequalities of the Fisher information and 
of the entropy are saturated if and only if the probability measure is a tensor product. 
For functionals involving the entropy, similar ideas are classical and they have been used 
in [53, 37, 76, 13, 58] for instance. 

We believe that this result gives a better understanding of the different notions of chaos. 
Other but related notions of entropy chaos are introduced and discussed in [17, 56]. The 
entropy chaos definition in [17], which consists in asking for point (Hi) and (iv) above, is 
in fact equivalent to ours thanks to Theorem 1.4. 

It is worth emphasizing that Theorem 1.4 may be very useful in order to obtain entropic 
propagation of chaos (possibly with rate estimate) in contexts where some bound on the 
Fisher information is available and propagation of Kac's chaos is already proved. Unfortu- 
nately, a bound on the Fisher information is not easy to propagate for N particle systems. 
However, for the so-called "Maxwell molecules cross-section", following the proof of the 
fact that the Fisher information decreases along time for solutions to the homogeneous 
nonlinear Boltzmann equation [48, 70, 74] and for solutions to the homogeneous nonlinear 
Landau equation [75], it has been established that the N particle Fisher information also 
decreases along time for the law of solutions to the stochastic Kac-Boltzmann jumps pro- 
cess in [55, Lemma 7.4] and for the law of solutions to the stochastic Kac-Landau diffusion 
process in [20]. In these particular cases, Theorem 1.4 provides a quantitative version of 
the entropic propagation of chaos proved in [55], and we refer to [19, 20] for details. 

Summary of Section 4. 
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Here we consider the framework of probability measures with support on the "Kac's 
spheres" K.Sn defined by 

1CS N := {V = K ...,v N ) E R N , v\ + ... +v 2 N = N}, 

as firstly introduced by Kac in [41]. Our aim is mainly to revisit the recent work [17] and 
to develop "quantitative" versions of the chaos analysis. 

We start proving a quantified "Poincare Lemma" establishing that the sequence of 
uniform probability measures o~ N on KSn is 7-Kac's chaotic, with 7 the standard gaussian 
on R, i.e. 7(f) = (27r)~ 1//2 exp(— |v| 2 /2), in the sense that we prove a rate of converge to 
for the quantification of chaos Qn((t n ; 7). We also prove that for a large class of probability 
densities / E P(E) the corresponding sequence (F N ) of "conditioned to the Kac's spheres 
product measures" (see section 4.2 for the precise definition) is /-Kac's chaotic in the 
sense that we prove a rate of converge to for the quantification of chaos Q 2 {F N ; f). 
That last result generalizes the "Poincare Lemma" since / = 7 implies F N = a N . The 
main argument in the last result is a (maybe new) L°° optimal rate version of the Berry- 
Esseen theorem, also called local central limit theorem, which is nothing but an accurate 
(but less general) version of [17, Theorem 27]. Together with Theorem 1.2, or the more 
accurate version of it stated in section 2, we obtain the following estimates. 

Theorem 1.5. The sequence (o~ N ) of uniform probability measures on the "Kac's spheres" 
is 7 -Kac's chaotic, and more precisely 



(1.9) ViV>l n 2 (a N ; 7 )<^, n N (a N - n ) < ^(^7) < C 3 ^fi-, 

JM iV2 jV~2 

for some numerical constants Ci, i = 1,2,3. 

More generally, consider f E P(R) with bounded moment Mk(f) of order k > 6 and 
bounded Lebesgue norm of exponent p > 1. Then, the sequence (F N ) of associ- 

ated "conditioned (to the Kac's spheres) product measures" is f -Kac's chaotic, and more 
precisely 

(1.10) ViV>l Q 2 (F N ; f) < — r> n N (F N ;f)<-^r, fi^i^;/) < 

AT2 N2 N2 

for any 7 € (0, (2 + 2/k)^ 1 ) and for some constants Ci = Cj(/, 7, k), i = 4, 5, 6. 

Let us briefly discuss that last result. The question of establishing the convergence for 
the empirical law of large numbers associated to i.i.d. samples is an important question 
in theoretical statistics known as Glivenko-Cantelli theorem, and the historical references 
seems to be [33, 15, 71]. Next the question of establishing rates of convergence in MKW 
distance in the above convergence has been addressed for instance in [28, 1, 26, 62, 55, 10], 
while the optimality of that rates have been considered for instance in [1, 68, 26, 4]. We 
refer to [4, 10] and the references therein for a recent discussion on that topics. With our 
notations, the question consists in establishing the estimate 

(i.n) E(W! (/$„,/)) = Ooo(/ 07V ;/) < ^, 

for some constants C = C(f) and £ = C(/)- I n t ne above left hand side term, X is a E N - 
valued random vector with independent coordinates with identical law / or equivalently 
X = X is the identity vector in E N and E is the expectation associated to the tensor 
product probability measure f 9N . When E = R d , estimate (1.11) has been proved to 
hold with £ = 1/d, if d > 3 and supp/ is compact in [26], with C < Q c := (d 1 + d'/k) -1 , 
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d! = max(<i, 2), if d > 1 and M)~(f) < oo in [55] and with £ = £ c if furthermore d > 3 in 

:io;. 

To our knowledge, (1.9) and (1.10) are the first rates of convergence in MKW distance for 
the empirical law of large numbers associated to triangular array X N which coordinates 
are not i.i.d. random variables but only Kac's chaotic exchangeable random variables. 
The question of the optimality of the rates in (1.9) and (1.10) is an open (and we believe 
interesting) problem. 

Now, following [17], we introduce the notion of entropy chaos and Fisher information 
chaos in the context of the "Kac's spheres" as follows. For any j £ N, and f,g£ P(-E- ? ), 
we define the usual relative entropy and usual relative Fisher information 

H(f\g):=-[ ulogug(dv), I(f\g) := - [ ^g(dv), u:=f, 
J Jeo J Jej u dg 

where u = ^ stands for the Radon-Nikodym derivative of / with respect to g. 

For / G P(£) and G N G P sym ()CS N ) such that Gf -»> / weakly in P(E), we say that 

(G N ) is 

(a') /-entropy chaotic if H(G N \a N ) -> H(f\-y), H(f\j) < oo; 

(b ; ) /-Fisher information chaotic if I(G N \a N ) — > I(f\ r f), I(f\j) < oo. 

In a next step, we prove that for a large class of probability measures / G P(R) the 
sequence (F N ) of associated "conditioned (to the Kac's spheres) product measures" is 
/-entropy chaotic as well as /-Fisher information chaotic, and we exhibit again rates for 
these convergences. The proof is mainly a careful rewriting and simplification of the proofs 
of the similar results (given without rate) in Theorems 9, 10, 19, 20 & 21 in [17]. 

We next generalize Theorem 1.4 to the Kac's spheres context. Additionally to the yet 
mentioned arguments, we use a general version of the HWI inequality proved by Lott and 
Villani in [46], see also [73, Theorem 30.21], and some entropy and Fisher inequalities on 
the Kac's spheres established by Carlen et al. [18] and improved by Barthe et al. [3]. 

All these results are motivated by the question of giving quantified strong version of 
propagation of chaos for Boltzmann-Kac jump model studied in [55] by Mouhot and the 
second author, where only quantitative uniform in time Kac's chaos is established. As 
a matter of fact, K. Carrapatoso in [19] extends the present analysis to the probability 
measures with support to the Boltzmann's spheres and proves a quantitative propagation 
result of entropy chaos. 

Another outcome of our results is that we are able to give the following possible answer 
to [17, Open problem 11]: 

Theorem 1.6. Consider (G N ) a sequence ofP sym (M. N ) with support on the Kac's spheres 
K,Sn such that 

(1.12) M fc (Gf ) < C, I(G N \a N ) < C, 

for some k>2 and C > 0. Also consider f £ P(K), satisfying f v 2 f(v)dv = 1 and 

(1.13) / > exp(-a\v\ k ' + 0) on M, 

with < k' < k, a > 0, (3 G E. If (G ) is f -Kac's chaotic, then for any fixed j > 1, there 
holds 

H(Gf\f® j ) ^0 as N^oo, 

where H(-\-) stands for the usual relative entropy functional defined in the flat space E 3 . 
Remark that the boundedness of the k-th moment of G N is useless when k < 2 (because 
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the support condition implies M2(G^) = 1 ) while the condition on the second moment of 
f is useless if k > 2 (because it is inherited from the properties of (G^) ). 

Contrarily to the conditioned tensor product assumption made in [17, Theorem 9] which 
can be assumed at initial time for the stochastic Kac-Boltzmann process but which is not 
propagated along time, our assumptions (1.12) and (1.13) in Theorem 1.6, which may 
seem to be stronger in some sense, are in fact more natural since they are propagated 
along time. We refer to [55, 19] where such problems are studied. 

Summary of Section 5. Here we investigate how our techniques can be used in the 
context of probability measures mixtures as introduced by De Finetti, Hewitt and Savage 
[24, 40] and general sequences of probability densities G N of N undistinguishable particles 
as N — > oo, without assuming chaos, as it is the case in [53, 13, 14] for instance. The 
results developed in that section are also used in a fundamental way in the recent work 



In a first step, we give a new proof of De Finetti, Hewitt and Savage theorem which is 
based on the use of the law of the empirical measure associated to the j first coordinates 
like in Diaconis and Freedman's proof [25] or Lions' proof [44], but where the compactness 
arguments are replaced by an argument of completeness. As a back product, we give a 
quantified equivalence of several notions of convergences of sequences of P S ym 

(E N ) to its 

possible mixture limit. 

In a second step, we revisit the level 3 entropy and level 3 Fisher information theory 
for a probability measures mixture as developed since the work by Robinson and Ruelle 
[64] at least. We give a comprehensive and elementary proof of the fundamental result 



for any probability measures mixture tt £ Pfc(P(i?)), k > (see paragarph 5.1 where 
the space Pf,(P(E)) is defined), where iij stands for the De Finetti, Hewitt and Savage 
projection of tt on the j first coordinates and K stands for the Boltzmann's entropy or the 
Fisher information functional. It is worth noticing that while the representation formula 
(1.14) is well known when K stands for the Boltzmann's entropy, we believe that it is 
new when K stands for the Fisher information. The representation formula for the Fisher 
information is interesting for its own sake and it has also found an application as a key 
argument in the proof of propagation of chaos for system of vortices established in [32]. 

In our last result we establish a rate of convergence for the above limit (1.14) when K is 
the entropy functional mainly under a boundedness of the Fisher information hypothesis 
and we generalize such a quantitative result establishing links between several weak notions 
of convergence as well as strong (entropy) notion of convergence for sequences of probability 
densities G N £ P sym (E N ) as N — > oo, without assuming chaos. 
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2. Kac's chaos 

In this section we show the equivalence between several ways to measure Kac's chaos as 
stated in Theorem 1.2. We start presenting the framework we will deal with in the sequel, 
and thus making precise the definitions and notations used in the introductory section. 

2.1. Definitions and notations. In all the sequel, we denote by E a closed subset of 
M. d , d > 1, endowed with the usual topology, so that it is a locally compact Polish space. 
We denote by P(E) the space of probability measures on the Borel <r-algebra 2$e of E. 

Monge-Kantorovich-Wasserstein (MKW) distances. 

As they will be a cornerstone in that article, used in different setting, we briefly recall 
their definition and main properties, and refer to [72] for a very nice presentation. 

On a general Polish space Z, for any distance D : Z x Z —> M + and p G [1, oo),we define 
W D)P on P(Z) x P(Z) by setting for any p 1 ,p 2 G P(Z) 

[W D , p ( Pl ,p 2 W := inf / D(x,yyir(dx,dy) 
7renOi,p 2 ) Jzxz 

where U(px,p2) is the set of proability measures ir G P(Z x Z) with first marginal p\ and 
second marginal pi, that is ir(A x Z) = pi(A) and ir(Z x A) = p 2 (A) for any Borel set 
Ac Z. It defines a distance on P(Z). 

The phase spaces E N (its marginal's space E 3 ) and P(E). 

When we study system of N particles, the natural phase space is E N . The space of 
marginals E 3 for 1 < j < JV are also important. We present here the different distances 
we shall use on these spaces. 

• On E we will use mainly two distances : 

— the usual Euclidean distance denoted by \x — y\; 

— a bounded version of the square distance : d E (x, y) = \x — y\ A 1 for any x,y G E. 

• On the space E J for 1 < j, we will also use the two distances 

— the normalized square distance \X — Y\ 2 defined for any X = (x\, . . . ,Xj) G E 3 
and Y = (y u ...,%) G E' J by 

1 j 

\ x ~ Y \i '■= ~^2\ x i -%'l 2 ; 

3 i=l 

— the normalized bounded distance dj = d E3 defined by 

1 j 

(2.1) d Ej {X,Y):=-Y t d E {x ii y i ). 

It is worth emphasizing that the normalizing factor 1 / j is important in the sequel in order 
to obtain formulas independant of the number j of variables. 

• The introduction of the empirical measures allows to "identify" our phase space E N to a 
subspace of ~P{E). To be more precise, we denote by Vn{E) the set of empirical measures 

V N (E) := {/if, X = On, ...,x N ) G E N ] C P(E), 

where p^ stands for the empirical measure defined by (1.3) and associated to the config- 
uration X = (xi, . . . , x n ) G E N . We denote by : E N — > Vn(E) the application that 
maps a configuration to its empirical measure : pn(X) := p^- 

• On our phase space P(E), we will use three different distances 
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The usual MKW distance of order two W 2 defined as above with the choice D(x, y) 



\x - y\ z 



W 2 (pi, P2) 2 = Wu 2 (Pl, P2) 2 ■= inf / \x-y\ 2 ir(dx,dy). 

7ren(pi,p 2 ) JexE 

- The MKW distance W\ associated to dE defined by 

(2.2) Wi(pi,p 2 ) =W dEt i(pi,p 2 ) := inf / d E {x,y)7r{dx,dy). 

7ren(pi,p 2 ) JexE 

From the Kantorovich- Rubinstein duality theorem (see for instance [72, Theorem 1.14]) 
we have the following alternative characterization 

(2.3) \Jpi,p 2 £P(E) W l (p 1 ,p 2 )= sup f <p(x)(pi(dx)-p 2 (dx)), 

\\<p\\u p <iJe 

where ||</?||.Lip := sup X7 L y ^^{xy)^ is the Lipschitz semi- norm relatively to the distance 
dE- This semi- norm is closely related to the usual Lipschitz semi- norm since it satisfies 

(2.4) gdlVpHoo + H^-^(O)Hoc) < \W\\up < 2 (llV^Hoo + Halloo) =: 2 II^II^Lao. 
It implies that W\ is equivalent to the (VF 1 ' 00 /-distance, denoted by D w ^,°°-, 

D w i,oo(pi,p 2 ) := sup / (p(x) (pi(dx) - p 2 (dx)), 

l<Pll H /l,oo<l JE 

and more precisely 

(2.5) - D w x,oo < Wi < 2 %i,oo . 

- The distance induced by the H~ s norm for s > ^ : for any p, rj S P(E) 

d£ 



\p-v\\ 2 H s--= [ \m-mf 



(0 2s 

where p denotes the Fourier transform of p (which may always be seen as a measure on 
the whole R d ), and (0 = ^/lTW■ 

• We will often restrict ourself to the spaces Pfc(I?) of probability measures with finite 
moment of order k > defined by 

Pfc(E) := {p G P(E) s.t. M k (p) := [ (v) k p(dv) < +00}. 

JE 

The probability measures space P(E N ), its marginals spaces P(E : >), and P(P(£')). 
The next step is to consider probability measures on the configuration spaces. 

• The space P(E N ) will be endowed with two distances 

— W\ the MKW distance on P(E N ) associated to d E N and p = 1, which has the 
same properties as the one constructed on P(E) and satisfies in particular the 
Kantorovich-Rubinstein formulation (2.3). 

— W 2 the MKW distance associated to the normalized square distance | • I2 defined 
above. 
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Remark that we will only work on the subspace P sym (E N ) of borelian probability measures 
which are invariant under coordinates permutations. 

• On the probability measures space P(P (E)), we can define different distances thanks to 
the Monge-Kantorovich-Wasserstein construction. We will use three of them 

— Wi, the MKW distance induced by the cost function W\ on P(E). In short 

(2.6) Wi (01,0:2) = W Wl ,i(ai,a 2 ) := inf / W\(pi, p 2 ) w(dpi, dp 2 ), 

jren(ai,aa) Jp(E)xP(E) 

— W2, the MKW distance induced by the cost function W"^ on ~P(E). In short 

W 2 (ai,02) 2 = WV 2 ,2(oi, Q2) 2 := inf / W% (pi, p 2 ) n(dpi, dp 2 ), 

7rsn(ai,a 2 ) Jp(B)xP(E) 

— W H - S , the MKW distance induced by the cost function || • ||^- s on P(E). In short 
W H - S (ai,a 2 ) 2 = Wn.n 2 (ai,a 2 ) 2 := inf / \\p 1 - p 2 \\ 2 H _ s ir(dpi,dp 2 ). 

7ren(ai,a 2 ) JP(E)xP(E) 

• Remark that the application "empirical measure" pn allows to define by push- forward 
a canonical map between P(E N ) and P(P(i^)). For G N G P(E N ) we denote its image 
under the application pjy by G N G P(P(E)) : G N := G^pw- In other words, G N is the 
unique probability measure in P(P(£^)) which satisfies the duality relation 

(2.7) V#£Q(P(£)) (G N ,$)= [ $(p%)G N (dX). 

Je n 

More properties of the space P(P(E)). 

• Marginals of probability measures on ~P(P(E)). We can define a mapping form 
~P(P(E)) onto P(£ ,J ) in the following way. For any a G P(P(-E)) we define the projection 
aj G P(£ ,J ) thanks to the relation 

Oj := J p®i da(p). 

It may also be restated using polynomial fonctions : for any ip G Cb{E 3 ) we define the 
monomial (of order j) function R v G Cb(P(E)) by 

(2.8) VpeI>(E) R^(p):= [ <p{X) p**{dX). 

JEi 

We remark that the monomial functions of all orders generate an algebra of continous 
fonction (for the weak convergence of measures) that are called polynomials. When E is 
compact so that ~P(E) is also compact, they form a dense subset of Cb(P(E)) thanks to 
the Stone- Weierstrass theorem. 

In terms of polynomial fonctions, the marginal aj may be defined by 

V<peC b (E>) ( aj ,<p) := (o,i^>. 

• Starting from G N G P sym (E N ), we can define its push-forward G N and then for any 
1 < j < N the marginals of the push-forward G^ := {G N )j G P sym {E 3 ). They satisfy the 
duality relation 

(2.9) V^Q(F) {G?,<p):=[ R^)G N (dX). 

Je n 
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We emphasize that it is not equal to the j-th marginal of G , but we will see later 
that the two probability measures G^ and G^ are close (a precise version is recalled in 
Lemma 2.8). 

Different quantities mesuring chaoticity. Now that everything has been defined, we 
introduce the quantities that we will use to quantify the chaoticity of a sequence G N G 
P sym (E N ) of symmetric probability measure with respect to a profil / 6 P{E): 

— The chaoticity can be mesured on E 3 for j > 2. For any 1 < j < N, we set 

n 3 (G N ;f) -WtiGfj^), 

— and also on P(E) by 

ft<x>(G N ; f) :=Wi(G N ,S f ) = [ W 1 (^,f)G N (dX), 

Je n 

since there is only one transference plan a® 5f in U(a, 5f). 

2.2. Equivalence of distances on P(E), P sym {E N ) and P(P(E)). . 

To quantify the equivalence between the distances defined above on P(E), we will 
need some assumption on the moments. The metrics W±, W2 and ||-||#-s are uniformly 
topologicaly equivalent in P /%(£?) for any k > 0. More precisely, we have 

Lemma 2.1. Choose f,g£ P(E). For any k > 0, denote '■= Affc(/) + Mk(g). 



(i) For any k > and s > 1, there exists C := C(d) 
holds 



, such that there 



(2.10) W 1 (f,g) < CJt^ a 
(ii) For any k > 2, there holds 

(2.11) W 2 (f,g) < 2ljt 1 k l k W 1 U,9) l/2 - l/k . 

(Hi) Without moment assumptions and for any s > ^p-, there exists a constant C = 
C(s,d) such that there holds 

W 1 (f,g)<W 2 (f,g), \\f-g\\ H -. <CWx(f,g)l 

We remark that we have kept the explicit dependance on s of the constant appearing 
in (i) in order to be able to perform some optimization on s later. The important point is 
that the constant may be choosen independant of s if s varies in a compact set. 

Proof of Lemma 2.1. The proof is a mere adaptation of classical results on comparison 
of distances in probability measures spaces as it can be found in [62, 21, 55] for instance. 
We nevertheless sketch it for the sake of completness. 

Proof of i). We consider a truncation sequence xr( x ) = x( x /R)> R > 0, with \ S 
C™(R d ), ||Vx||oo < 1, < x < 1, X = 1 on B(0, 1), and the sequence of mollifiers %(x) = 
e- d j(x/e), e>0, with 7 (x) = (2vr)- d / 2 exp(-|x| 2 /2), so that %(£) = exp(-e 2 |£| 2 /2). In 
view of the equivalence of distance (2.5), we choose a if G W l,oc (M. d ) such that ||(^|| w i,oo < 
1, we define <p R := (fXR, PR,e = <PR * le and we write 

j <f(df- dg) = j ip Rt e (df -dg)+ j (ip R - Lp R , £ ) (df - dg) + j - Lp R ) (df - dg). 
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For the last term, we have 



V R> 



< f B M\J^(df + d g )<^. 



(<PR ~ <P) (df ~ dg) 
For the second term, we observe that 

W^R ~ felloe < HV^rIIoo / j e (x)\x\dx <C(d)e, 

JM. d 

and we get 

<C(d)e. 



(fR ~ <PR,e) (df - dg) 
Finally, the first term can be estimated by 



< ||¥>H, e ||ii- s ||/ - 9\\h-*i 



1/2 



<PR,e (df - dg) 
with for any R > 1 and e E (0, 1] 

WvrAh- = (/(0 2 l^x5l 2 (0 2(s_1) l7 E | 2 ^ 

< hxR\\m\\(0 s ~ 1 %(0h^<c(d)R d / 2 U0 s ' 1 %(0\\Lo 

The infinite norm is finite and a simple optimization leads to 

no*- 1 %(0h~ < 



1 2=1 



with the natural convention 0° = 1. All in all, we have 



W 1 (f,g)<C(d) 



1 + 



s - 1 



e + %Z + Rte-<-»\\f-g\\ B - 



This yields to (2.10) by optimizing the paramater e and R with 

2a _ 2 d 2k 

d- 



R = M ™ ||/ - g\\ H d _ + B 2ka , and e = ||/ - g\\^ . 



Proof of ii). We have for any R > 1 the inequality 

2 k 

Vx,y£E, \x-y\ 2 < R 2 d E (x,y) + (\x\ k + \y\ k ) 
from which we deduce 

W 2 (f,g) 2 < R 2 inf f d E (x,y)TT(dx,dy) 

(M fc + \yi\ k )n(dx,dy) 



+ 



R k - 



sup 

T6n(/, s )^£x£; 



< R 2 W 1 (f,g) + 



2 k 



(M k (f)+M k (g)), 



and then we get with (R/2) k = .Jt k jW\ 

(2.12) W 2 (f,g) < 2 3 / 2 ^ 1/fe Ty 1 (/, 5 ) 1 /2-iA. 

Proof of Hi). The first point is classical. The second relies on the fact that 
||4 - 5y\\ H -s < Cd E (x,y). 

There is also a similar result on E N , where the H~ s norm is less usefull. 



□ 
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Lemma 2.2. Choose F N ,G N G P sym (E N ). For any k > 0, denotes 

^ k :=M k (F^)+M k (G^). 

For any k > 2, it holds that 

(2.13) W 2 (F N , G N ) < 2§ Jtl /k W!(F N , G N ) 1 l 2 ~~ 1 l k . 

It also holds without moment assumptions that W\{F ,G N ) < W2(F N ,G N ). 

Proof of Lemma 2.2. The proof is a simple generalization of (2.11) to the case of N 
variables. We skip it. □ 

The inequalities of Lemma 2.1 also sum well on P(P(i?)) in order to get 

Lemma 2.3. Choose a,/3 G P(P(E)), and for k > define 

Jt k := M k {a) + M k {fi) := [ M k (p) [a + $]{dp) = M k {a x ) + M k (j3\). 



(i) For any s > 1 and u>i£/z £/te same constant C(d,s) as in point (i) of Lemma 2.1 we 
have for any k > 0, 



(2.14) Wi(a,/3) < CJZ k d+2ks W H - S (a, p) . 
ii) For any k > 2, it also holds 

(2.15) (ii) W 2 (a,/3) < ihjq Wi(a,/3)W, 

i 

It holds without moment assumption that W\ < W2 and W^-s < CWj 2 for s > ^jp 
with a constant C = C(s,d). 

Proof of Lemma 2.3. All the above estimates are simple summations of the correspond- 
ing estimate of Lemma 2.1. We only prove i). 



inf [WifarfilLidfidrj) 

nen( Q ,/3)7 

/d 2k 
[M k (p) + M k (r,)]^\\p - vWh-s* U(dp, dri) 



d+2ks / I 1 \ d+2ks 

< C [ I M k (p) [a + 0\(dp) ) {m^J\\p-rj\\l^U(dp,d V ) 



d fin \ d+2ka 

< C[M k (a) + M fc (/3)]3+2fcJ I inf \\p-ri\\ 2 H -.U(dp,dri)) 

\nen(a„3) J J 

4 2k 

where we have successively used the inequality (2.10), Holder inequality, the definition of 
the moment of a and /3, and Jensen inequality. □ 
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2.3. Quantified equivalence of chaos. This section is devoted to the proof of Theo- 
rem 1.2, or more precisely, to the proof of the following accurate version of Theorem 1.2. 

Theorem 2.4. For any G N G P sym (E N ) and f G ~P(E), there holds 

(2.16) (t) Vl<j<l<N Qj(G N ; f) < 2Sl t (G N ;f), 

(2.17) (ii) Vl<j<N 0,j(G N ; f) < n oo (G N ;f) + J -. 

For any k > and any < 7 < d+ \ + d > there exists a explicit constant C := C(d,^f,k) 
such that 

(2.18) (Hi) n oo (G N ;f)<C^ (n 2 (G N ;f) + ±y , 

where as usual '■= Mk(f) + Mk(G^). 

For any k > and any < 7 < g > d' = max(d, 2), there exists a constant 

C := C(d,j,k) such that 

(2-19) (iv) \n N (G N ;f) - O^G*;/)! < C ■ 

Let us make some remarks about the above statement. Roughly speaking, the two 
first inequalities are in the good sense: the measure of chaos for a certain number of 
particles is bounded by the measure of chaos with more particles, and even in the sense 
of empirical measure (i.e. with f^oo). Let us however observe that the second inequality 
is meaningful only when the number j of particles in the left hand side is not too high, 
typically j = o(\^N). The third inequality is in the "bad sense" and it is maybe the 
most important one, since it provides an estimate of the measure of chaos in the sense of 
empirical measures by the measure of chaos for two particles only. It is for instance a key 
ingredient in [55]. See also corollary 2.11 for versions adapted to probability measures with 
compact support or with exponential moment. The last inequality compares the measure 
of chaos at N particles to its measure in the sense of empirical distribution. It seems new 
and it will be a key argument in the next sections in order to make links between the 
Kac's chaos, the entropy chaos and the Fisher information chaos. 

Remark 2.5. In the inequality (2.18), the VL2 term in the right hand side may be replaced 
by any Vtn for I > 2, but it cannot be replaced by Oi, which does not measures chaoticity, 
as it is well known. We give a counter-example for the sake of completeness. We choose 
g and h two distinct probability measures on E, and take f := \(g+ h). We consider the 
probability measure G G P(P(E)), and its associated sequence (G N ) of marginal probability 
measures on P(E N ) defined by 



G= l -(S 9 + 5 h ), G N :=\g® N + i 

As G\ = f , Q%(G , f) = for all N, inequality (2.18) with replaced by Q\ will imply 
that Qoc(G N , f) goes to zero. But from inequality (2.17) of Theorem 2. 4 

W 1 (G\ f® 2 ) = n 2 (G N , f) < Qoo(G N , f) + ^. 

There is a contradiction since G 2 7^ /® 2 except if g = h. 
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We begin with some probably well known elementary inequalities and identities concern- 
ing Monge-Kantorovich-Wasserstein distances in space product. For the sake of complete- 
ness we will nevertheless sketch the proofs of them. Remark that the two first formulas are 
particularly simple thanks to the choice of the normalization (2.1), and that they remains 
valid if we replace dj by the normalized ^-distance j Y2% \ x i ~ Ui\- 

Proposition 2.6. a) - For any F N ,G N G P sym (E N ) and 1 < j < N, there hods 



2.20) Wi(F. N ,Gf) < [+- — ) W 1 {F 1 \G")<2W 1 {F 1 \G 



j \N 



-i 



?N riN\ ^ ow fjpN riNs 



b) - For any f,g G P(E), there holds 

(2.21) W 1 (f® N ,g® N ) = W 1 (f,g). 

c) - For any f,g,h G P{E), there holds 

(2.22) 2W 1 {f®h,g®h) = W l {f,g). 

As a immediate corollary of (2.20) with N := I, F e := f® & and G l := Gf, we obtain 
the first inequality (2.16) of Theorem 2.4. 

As can be seen in the following proof, similar results also holds for MKW distances 
constructed with arbitrary distance D and exponents p, and therefore for the W2 distance. 
We do not state them precisely, but they will be useful in the proof of the next Lemma 2.7. 

Proof of Proposition 2.6. 

Proof of (2.20). Consider tt G H(F N , G N ) an optimal transference plan in (2.2). Introduc- 
ing the Euclidean division, N = nj + r, < r < j — 1, and writing X = (X%, ...,X n ,Xo) G 
E N , Y = (Yi, ...,Y n ,X ) G E N , with X h Yi eff,l<i< n, X ,Y G E r , we have 

Wt(F N ,G N ) = [ d E N(X,Y)ir(dX,dY) 
Je 2N 

= h j E2N (^Jdw(X»Xi) + rdEr{X ,Y Q ^ 7r(dX,dY) 

n „ 

V / dE^Xi^TTiidX^dXi), 

U. jE2j 



> 



J_ 

N 



with 7Tj G II(Fj,Gj), where Fi and G{ G ~P(E 3 ) denote the marginal probability measures 
of F N and G N on the i-th block of variables. From the symmetry hypothesis, we have 
Fi = F\ = F^ and Gj = G\ = G^ for any 1 < i < n. As a consequence, we have 



/ d EJ (Xi , Yi) 7Tj (dX t ,dXi)> Wi (Ff , Gf ) 



and we then deduce the first inequality in (2.20). Since the integer portion n := [N/j] is 
larger than 1, we have 

3_ \N- 

N 



nj nj 1 
1 > — > -. 



nj + r ~ nj+j ~ 2 ' 
from which we deduce the second inequality in (2.20). 

Proof of (2.21). We consider a G U(f,g) an optimal transference plan for the Wi(f, g) 
distance and we define the associated transference plan ft := a® N G U(f® N , g® N ) by 

V Ai,E>i G E 1t(Ai x ... x A N x Bi x ... x B N ) = a{Ai x Bi) x ... x a(A N x B N ). 



18 



M. HAURAY AND S. MISCHLER 



By definition of Wi(f m , £ 0Ar ), we then have 

JV 



< 



i=l 



d(x i ,y i )7t(dX,dY) = W 1 (f,g). 



Since the first inequality in (2.20) in the case j = 1 implies the reverse inequality, the 
above inequality is an equality. 

Proof of (2.22). On the one hand, from the definition of the distance W\ by transference 
plans, we have for an optimal transference plan tt S Il(/ (g) h, g <8> h) the inequality 

1 



Wi(f®h,g®h) 



E 4 



(d E (xi,yi) + d E (x 2 ,y2)) ir(dx 1 ,dx2,dy 1 ,dy 2 ) 



1 

> - 
~ 2 



1 



dE(xi,yi)ni{dxi,dyi) > -Wi(f,g), 

E 4 1 



since the 1-marginal 7Ti defined by tti(A x B) = -k(A x E x B x E) for any A, B G 3&e 
belongs to the transference plans set U(f,g). 

On the other hand, considering an optimal transference plan tt E U(f,g) for the W\ dis- 
tance, we define the associated transference plan 7f(dx, dy) := ir(dxi,dyi) h(dx2)S y2=X2 G 
Il(/ <S> h, g (g) /i), and we observe that 

1 

2 
1 

2 



Wi{f®h,g®h) < - (dE(xi,Ui) + d E (x2,y2))n(dxi,dx 2 ,dyi,dy2) 

IE 4 

1 



d£(xi,yi)7r(ctei,c2yi) = -Wi (/,#). 

E 4 2 



We obtain (2.22) by gathering these two inequalities. 



□ 



We next prove another lemma that allows to compare a distance between measures on 
~P(P(E)) and a distance between their marginals on E 3 , and thus to compare 0,g and £loo- 

Lemma 2.7. For any distance D on E and p > 1, extend D on E J with Dj tP (V, W) p = 
i D(vi,Wi) p , and define the associated MKW distance W Ej P , P on P(E^) and the MKW 
distance Ww D ,p on associated to Wd and p. Let a and (5 be two probability 

measures on ~P(P(E)). Then, for any j £ N, 

(2-23) W D . pjP (a j ,P j ) < Ww D>p , p (a,/3) 

That is in particular true for the MKW distances W\ and W2 defined in section 2.1 

Vj G N, W 2 (a 3 ,/?,-)< W 2 (a, /3) , W x (aj ,/?,-)< VWi (a, p). 

Proof of lemma 2.7. For simplicity we denote for any j, Wd , p = Wd- We choose 
any transference plan IT between a and (3 and write 

v 



p^a(dp), / p»>P(dp) 



p® j ir(dp,dr)), / V ® j ir(dp,dri) 



< 



< 



W D 
W d 

W D (p®i,r]®i)ir(dp,dri) 
[W D (p, V )] P <dp,dri), 
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where we have used the convexity property of the Wasserstein distance, the equivalent of 
equality (2.21) in our general case, and Jensen inequality. By optimisation on it we obtain 
the claimed inequality. □ 

As a consequence of a classical combinatory trick, which goes back at least to [36], we 
have 

Lemma 2.8 (Quantification of the equivalence ~ Gf)- F° r an V G N £ P S ym(E N ) and 
any 1 < j < 1 + N/2, we have 

\\Gf-Gf\\ TV <2^^ and Wl (G? ,6f) = 

and in particular the first marginals are equal: G 1 ^ = G 1 ^ . 

Proof of Lemma 2.8. The second inequality is a straightforward consequence of the 
first inequality together with the use of 

W 1 {Gf,Gf)< 1 -\\G^-Gf\\ TV . 

A proof of the later may be found in [72, Proposition 7.10], in a slightly different context. 
Here, the better factor 1/2 can be obtained because of the stronger assumptions of our 
setting (the distance d E j we deal with here is bounded by 1). 

The first inequality is a simple and classical combinatorial computation, see for instance 
[36], [67, Proposition 2.2], [57, Lemma 4.2] or [55, Lemma 3.3]. We briefly sketch the proof 
for the convenience of the reader. 

For 1 < j < N, we denote by the set of maps from {1, . . . , j} into {1, . . . , N}, and 
by Aj 1 the subset of made of the one-to-one maps. Remark that we have 

\Cf\=N\ u$n = _M 

(N-j)l 

Thanks to the symmetry assumption made on G N , we may write for any ip £ Cb(E^) 
(G?,<p)=[ ^x 1 ,...,x j )G N (dX) = ^^J2 f ^x s{1) ,...,x s(j) )G N (dX) 



From the definition of G^ we also get 



(G?,<P) = j^ E ^j V >(yi,...,y j ) P ^(dY^G N (dp) 

= J (| pfefr, . . . , yj )(n»r3(dYi)) G N {dX N ) 

ME [ <p(x s(l) ,...,x s{j) )G N (dX N ). 
sec? J 1 
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The difference is then equals to 



( G f-G?M = {^ l -jj) E f N ^x a(1) ,...,x a(j) )G N (dX) 

^1 I <f(x s{1) ,...,x s(j) ),G N (dX 



and may be bounded by 



\<O?-0?, V )\ < (l- ]w ^|L_)||H| t „ + i J |Cf\^||Mk~ 

( Nl \\\ 

= 2 V- Ni(N-j)\) MLa °- 
For N > 2(j — 1), we can bound the right hand side thanks to 

= i -( i 4)"( i -V)= i -^(&< i 4 

\ i=o / i=0 

where we have used 

V a? G [0,1/2], ln(l - x) > -2x and Vx G I 
We eventually get for j < 1 + iV/2 



ll<Pl|oo<l ' iV 



which ends the proof. □ 

Applying the previous lemmas 2.7 and 2.8, we can bound Qj by Qoo and some rest. 
This is the second inequality (2.17) of theorem 2.4. 

Proof of inequality (2.17) in Theorem 2.4. We simply write 

Sl j (G N ,f) = W 1 (G?,f&) < W l {G^G^) + W 1 {Gfj^) 

thanks to the two previous lemmas 2.7 and 2.8. □ 

We establish now the key estimate which will lead to the third inequality (2.18) in 
Theorem 2.4 where floo is controled by Following [55, Lemma 4.2], the main idea is 
to use as an intermediate step the H~ s norm on P(E), rather than the Wassertsein W\ 
distance, because it is a monomial function of order two on P(E), and thus has a nice 
algebraic structure. This fact is stated in the following elementary lemma. 



Lemma 2.9. For s > d/2, define $ s : M d — > R by 
(2.24) Vz £ $.(z):= f <T iz< — 



(0 s 
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The function $ s is radial, bounded, and furthermore if s > ^rr, it is Lipschitz. For any 

(2.25) \\p-rj\\ 2 H . s = / <S> s {x-y)(p m -p®n)(dx,dy)+ $ s (x-y) (r]® 2 -r]®p)(dx,dy), 
and for any p G P(E) 

\\p\\ 2 H . a = f <5>{x-y)p m {dx,dy), 



which means that the norm H s on P(-E7) is the monomial function of order two associated 
to the function (x,y) \-t & s (x — y)- 

Proof of Lemma 2.9. We obtain that $ s is bounded from the fact that J Rd (£)~ 2s ^£ 
is finite for s > d/2, and that it is Lipschitz from the fact that L d (£) 1-2s c&; is finite 
when s > {d + l)/2. We now prove (2.25). Using the Fourier transform definition of the 
Hilbert norm of H~ s (R d ), we have for any p,r) G H~ s (M. d ), and then for any p,r] G P(-E-) C 
P(M d ) C H~ s (R d ), 



p-v\\ 2 h-. = I m)-m)(p(o-m) 



(p(dx) - n(dx) (p(dy) - rj(dy)) e"^-^ 



d£ 



= / § s {x-y)(p m - p®n){dx,dy)+ / § s {x - y) {n m - n ® p){dx , dy) . 

JR 2d JR 2d 

The last identity follows from (2.25) by choosing n = 0. □ 
Thanks to that Lemma, we will be able to obtain the following key estimate. 

Proposition 2.10. For any s > — ti there exists a constant C = 2||^ s ||x,j p < ^s-d-i ^ 
(0, oo) (where Cd denotes the surface of the unit sphere ofM. d ) such that for any G N G 
P sy m(E N ), N > 1, f G P(-E), there holds 



(2.26) 



W H -s(G N ,5 f )<C W x {G%,f®f) 



N 



Proof of Proposition 2.10. Because P(E) c P(R d ) C £T 
n(G JV , 5 f ) = {G N ® <5/}, we have 



s (M d ) for s > \ and 



inf I[tt] = I(G N ®5 f ), 

7ren(G JV ,<5 / ) 



with cost functional 



IW 



-SL 



P(E)xP(£) 



p-77||i_ s 7r(dp,d77). 
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Using Lemma 2.9, we have 
I[G N ®S f ] = 



P(E) 



+ 



$ s (x - y) (p® 2 -p®f)(dx, dy) } G N (dp) 
$ s (x - y) (f® 2 -f®p)(dx, dy)} G N (dp) 



P(E) 

$ s (x - y) [G?(dx, dy) - G? (dx) f(dy)] 



E 2 



+ 



<Z> s (x - y) lf(dx) f(dy) - /(dx) £f (dy)}. 



Now we may bound the cost functional as follows: 



I[G N ® 5 f ] 



< 
< 
< 



\®s\\up 
\^s\\Lip 
\^s\\Lip 



W 1 (G$,G?®f) + W 1 (f®f,f®G?) 
W 1 (G%,f®f) + 2W 1 (f®f,G?®f) 
W 1 (G^,f0f) + W 1 (f,G^) 



< 2\\$ a \\ Lip W 1 (G%J®f), 

where we have used successively the Katorovich- Rubinstein duality formula (2.3), the 
triangular inequality, the identity (2.22), and the first inequality in (2.20) together with 
the fact that (6$ )i = Gf . □ 



Putting together Proposition 2.10, Lemma 2.8 above and Lemma 2.3 on comparaison 
of distances in ~P(P(E)), we may prove inequality (2.18) of Theorem 2.4. 

Proof of inequality (2.18) in Theorem 2.4. We define s := ^ — ^. Notice that 

s > 2±1 > l thanks to the conditions satisfied by 7 and k. We can thus applied the point 
i) of Lemma 2.3, Proposition 2.10 and then Lemma 2.8 in order to get 



^oo(G N ;f) 



< 



< 



Wi(G",6 f ) < C(d,s)^f k d+2ks yV H - s (G JS ,S f )d+2k 
C(d,s) 



'd+2ks 



Wl (G»J® 2 )^ 



2s-d-l 



7- 1 -d/k-d-1 



W^G^J 



N f ®2\ 



+ 



N 



since 7 



d+2 ks • This is the claimed inequality thanks to the definition of 0,2- It is 
important to notice that the constant C(d, 7, k) of the last line depends on d, k and 7 via 
s. But as explained at the end of lemma 2.1, it can be choosen independent of k and 7 if 



s = — 4k. remains in a compact subset of ' 



□ 



With stronger moment conditions on the probability measures / and G , we may 
improve the exponent in the right hand side of (2.18) and therefore the rate of convergence 
to the chaos. Introducing the exponential moment 

(2.27) VF6P(£), M fSX (F) := [ e x ^ F(dx), 

Je 

E = R d , /3,A>0, we have the following result. 
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Corollary 2.11. (i) There exists a constant C = C(d) such that if the support of f and 
Gf are both contained in the ball B(0,R), for a positive R, then 

(2.28) n oo (G N ;f)<CR f n 2 (G^; /) + 1 W 



ln(0 2 (G iV ;/) + i 



(ii) There exists a constant C = C(d,f3) such that if the f and Gi have bounded expo- 
nential moment of order Mp t \ for f3, A > 0, there holds 



(2.29) tl 00 (G N ; f)<°r (n 2 (G N ; /) + 1) 



1 - ■ 1+ 1 



\n[n 2 (G N ;f) + ^ 



A* 

where K := max(M ftA (/), M AA (Gf )). 
Proof of Corollary 2.11. 

Step 2. The compact support case. Here we simply have M^(/) < i? fc and the same for 
the moments of G± . Applying (2.18) with the explicit formula for the constant C, we get 
for any < 7 < ^ and k > 7 -/ d _ 1 

"oo(g"; /) ^ 7 -i -fk-l - d -i R { a ^ aN '^ + Jf)'' ■ 

And we use the remark at the end of the previous proof that allows to replace C(d, 7, k) 
by C(d) if s = ^- — ^ is restricted to some compact subspace of [l,+oo). It will be the 

case in the sequel since we shall choose k large and 7 close to gxj. Letting — >• +00 leads 
to 

n rx (G";f)< :F ^- 1 R(n 2 (G'';f) + ^y. 

Denoting a := - — d — 1 and a = Q 2 (G N ; f) + jj which we assume smaller than |, the 
r.h.s can be rewritten 

n oo (G N ;f)<C(d)-a 1 ^ d+l+a l 
a 

Some optimization leads to the natural choice a = 2 . It comes 

^oo(G N ; f) < C(d)R\lna\a 1 /^a l ^ d+1+a ^ d+1 l 
Since gij - < < we deduce 

a l/(d+l+a)-l/(d+l) < -l/(2|Ino|) = gi 

and this concludes the proof of point (■£). 

Step 2. The case of exponential moment. 

' l 

k moment 

and it implies with our notations Jl£ < (2K) 1 / k . Applying (2.18) with the 

explicit formula for the constant C and the notation a of the previous step, we get for any 
< 7 < sir and k > 



Using the elementary inequality x k < ( e x ^ , we get the following bound on the 
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Here we cannot take the limit as k — > oo, but optimizing in k the second fraction of the 
r.h.s, we choose k satisfying ^ — d — 1 = ¥ and get the bound 



Still denoting a = ^ — d — 1 = ^, the choice a = 2 leads this time to the bound 

which concludes the proof. □ 

Remark 2.12. Inequality (2.18) in Theorem 2.4 says in particular that for any k > 
and 0<7< (d + 1 + d/k) -1 there exists a constant C := C(d,^y,k) such that for any 
f e ~P(E), there holds 

(2.30) Ooo(/^;/)< CMkU)l ' k 



iV7 



For such a tensor product probability measures framework, the above rate can be im- 
proved in the following way. 

Theorem 2.13 ([55, 10]). 1. For a moment weight exponent k > and an exponent 

(i) 7 = 7c : = (2 + l/k)' 1 when d = 1, 

(ii) 7 G (0,7 C ) with 7c := (2 + 2/k)- 1 when d = 2, 
(hi) 7 = 7 C := {d + d/k)^ 1 when d > 3, 

there exists a finite constant C := C(d,j,k) such that (2.30) holds. 

2. Moreover, for any moment weight exponents A, (3 > 0, there exists a finite constant 
C := C(d, A, 0, Mp A (/)) such that 
(2.31) 

Ooo(/^;/) <C { ^0^, ifd=l, ^(r;/)<C (ln y , % fd>2. 

On the one hand, using similar Hilbert norm arguments as those used in the proof of 
Proposition 2.10 and inequality (2.18) in Theorem 2.4, the hrst point in Theorem 2.13 has 
been proved in [55, Lemma 4.2(iii)] with however the restriction 7 € (0,7 C ) when d > 1. 
The optimal rate C(l/iV (2+1/fc)_1 ) in the critical case 7 = 7c, d = 1, is not mentioned in 
[55, Lemma 4.2(iii)] but follows from a careful but straightforward reading of the proof 
of [55, Lemma 4.2 (iii)]. The better rate obtained in Theorem 2.13 with respect to (2.30) 
is due to the fact that for a tensor product measure one can work in the Hilbert space 
H~ s with s > d/2 rather than with s > (d + l)/2 in the general case. The second point 
in Theorem 2.13 follows by adapting the proof of Corollary 2.11 to this tensor product 
measures framework. 

On the other hand, using matching techniques, it has been proved in [26, 10] that 
(2.30) also holds true for the critical exponent 7 C = l/d in the compact support case 
(or exponential moment with (3 = 1) when d > 3 and 7 C = (d + d/A:) -1 in the case 
of hnite moment of order k when d > 3. These last results thus slightly improve the 
estimates available thanks to our Hilbert norms technique. It is worth mentioning that 
the critical exponents are known to be optimal, see for instance [26, 4]. A natural question 
is whether the rates in inequality (2.18) and in Corollary 2.11 may be improved using 
similar arguments as in [26, 10]. 
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We come to the proof of the last part of Theorem 2.4, which will be a consequence of 
the following proposition 

Proposition 2.14. For F N ,G N G P sym {E N ), there holds 

(2.32) Wx{F N , G N ) = Wx(F N , G N ). 

PROOF of Proposition 2.14. We split the proof into two steps. 

Step 1. A reformulation of the problem. Since we are dealing with symmetric probability 
measures, it is natural to introduce the equivalence relation ~ in E N by saying that 
X = (xi, ...,xjsf),Y = (yi,...,yjy) G E N are equivalent, we write X ~ Y, if there exists a 
permutation a £ &n such that Y = X a := (x a n\, . . . , x a ( n \). 
We also introduce on E N the "semi" -distance w\ 

1 N 

(2.33) wi(X,Y) := inf oI e n(X, Y a ) = inf — V) d E (xi, y a (i)), 

o-e6jv o-eSiv JM ~ 

i=\ 

which only satisfies w\(X,Y) = iff X ~ Y. We then introduce the associated MKW 
functionnal w\. For F N ,G N G P sym (^), 

WjfF^G*) := inf f w\(X, Y) ir N (dX, dY). 

7v N £zTl(F N ,G N ) J E N x E N 

It is in fact a distance on the space of symmetric probability measures, but this point will 
also be a consequence of our proof. It is a classical result (see for instance [72, Introduction. 
Example: the discrete case]) that 

(2.34) VX,YeE N , W 1 (ii%, t #) = wi(X,Y), 

(shortly, it means than we do not need to split the small Dirac masses when we try to 
optimize the transport between two empirical measures). We recall the notation pjy defined 
in section 2.1 for the application that sends a configuration to the associated empirical 
measure : pn(X) = p^- 

Remark that its associated push-forward mapping restricted to the symmetric proba- 
bility measures 

p N : V sym {E N ) -> V{V N {E)) C P(P(£)), G N ^ G N := G%p N , 

is a bijection. Its inverse can be simply expressed thanks to a dual formulation: for 
a G ~P{Vn(E)), its inverse a = p~^a is the probability measure satisfying 

V<peC b (E N ), [ cp(X)a(dX)= [ <p{p)a{dp), 

JE N JV N {E) 

where <f>(p) := YIug&n ^(^°")> ^ or an y gi ven X such that \x = fi^. Similarly, defining 
P SjS (E N x E N ) the subset of P{E N x E N ) of probability measures which are invariant 
under permutations on the first and second blocks of N variables separately, we have that 

p% 2 : P S>S (E N x E N ) -> P(V N (E) x V N (E)), tt n ^ tt n := ir%(p N ,p N ), 

is a bijection. 

The identity (2.34) and the bijection p^ allows us to establish the identity 

(2.35) VF N G N G P(^), W\F N , G n ) = Wi(F N , G N ). 
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Indeed, denoting IL SjS (F N , G N ) = H{F N , G N ) n P,,,^, E N ), we have 
W}(F N ,G N ) = inf [ Wl (X,Y)ir N (dX,dY) 

inf / W 1 (p N (X) lPN (Y))TT N (dX,dY) 

n N gTI s ,s(F n ,G N ) J E N xE N 

= v inf / Wi(fi,rj)ir%(p N ,p N )(dp,dri) 

n N en a , a (F N ,g n ) J~p n (e)xVn(E) 

inf f W l (p,r ] )TT(dp,d V ) = W 1 (F N ,G N ), 

TreU(F N ,G N ) JP(E)xP(E) 

where we have essentially used the invariance w±(X,Y) = wi(X a ,Y T ) for any a, r E ©at 
and the fact that pf^ is a bijection. 

Step 2. The equality W\ = W\. The interest of the reformulation (2.35) is that we can 
now work on one space: E N . Remark that since w\(X, Y) < d E N(X,Y), we always have 
w\ < W\ , and the equality will hold only if one transference plan for w\ is concentrated 
on the set 

C := \(X,Y) E E N x E N s.t. u>i(X,Y) = inf d EN (X,Y a ) = d EN (X,Y)\ . 



cre&N 

We choose an optimal transference plan tt for w\. For simplicity we will assume that tt 
is symmetric, i.e. unchanged by the applications P a : (X, Y) \-t (X a ,Y a ) for any a E &n- 
If not, we replace it by its symmetrization ]yr X^o- ^^o" w bich will still be an optimal 
transference plan of F N onto G N . Starting from 7r, we will construct a transference plan 
tt* E U(F N ,G N ) such that 

- i) tt* is concentrated on C. 

- ii) I n [tt] = J wi(X,Y)ir(dX,dY) = J wi (X, Y) it* (dX, dY) = I N [ir*} 
Both properties imply then that 

W\(F N , G N ) = [ w 1 (X,Y)ir(dX,dY)= [ w\ (X, Y) tt* (dX, dY) 

JE N xE N JE n xE n 

= [ d E N(X,Y) it* (dX,dY) > W\(F N ,G N ) 

J E N xE N 

which is the desired inequality. 

We define tt* in the following way. First, we introduce for any X, Y E E N 

C X . Y ■= {Z € E N ; Z ~F and d EN (X,Z) = Wl (X,Y)} c E N 

Px,y := j^— Yl S(x,z)£~P(E N xE N ), Nx-y '■= #Cx-,y G N*. 

We note that Z E Cx-y iff Z ~ Y and (X, Z) E C, so that Supp px-y C C. It can be 
shown that (X, Y) \— > Nx-y is a borelian application (it takes finite values and its level 
set are closed) and that E N x E N ->• P(E N x E N ), (X,Y) ^ p X - Y is also borelian if 
P(E N x E ) is endowed with the weak topology of measures. This allows us to define a 
transference plan tt* by 



vr* := f Px -Y^{dX,dY) E P(E JV x £ 



CHAOS 27 
N „ j?N\ 



or in other words, for any ip G Ch{E x E ), we have 

(vr*,V) = / E / HX',Y')S {x>z) (dX',dY')7r N (dX,dY) 

J E 2N N X ;Y 7 ^T I JE 2N 

= / at- E dy). 

It remains to proof that ir* satisfy the announced properties. Since px-,Y is supported in 
C for any (X, Y) G E N x 1?^, it is also the case for ir*. It is also not difficult to show that 
the transport cost for w\ is preserved. Indeed, we have 

/ d EN (X',Y')i:*(dX',dY') = [ f-J— V d E N (X, Z) J Tv(dX, dY) 

JE 2N JE 2N \ ^X;Y Z ^ X Y J 

(-j- E MX,Y)) n(dX,dY) 



E 2N 



Wl {X,Y)ir(dX, dY). 

E 2N 



The fact that 7r* has first marginal F is also clear since for any 92 G Cb(E ) 

/ ^(x'K(dx',dy') = / ^ vr(dx,dy) 

ip(X)ir(dX,dY)= [ ^(X)F N (dX). 
Je n 



>E 2N 

For the second marginal, we shall use the following properties of Cx ; y and Nx-,y 
Vr G Sat, Z t G Cx T; y T <^> Z G Cx ; y, and thus iV"x Ti y T = Nx ; y. 
Thanks to the invariance by symmetry of 7r and G N , we can write for any tp G Cb(E N ) 

I ip(Y)7T*(dX,dY) = [ f-J— £ (^(Z)) 7r(dX,dy) 
Je 2N Je 2N \ iV x ; y Zi ^r / 

= m E L E *>< z >) -wn 



s E / £ „ E 

E W J vr(^, d y) 



E 2N 



f>(Y)ir(dX, dY) 

E 2N 



[ l p(Y)G N (dX)= [ <p(Y)G N (dX), 
Je 2N Je 2N 
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where we have introduced the symmetrization of ip defined by (p(Z) := J2a&6 N v(-^o-) 
and we have used that <p(Z) = <p(Y) for any Z E Cx-y an d the fact that G N is symmetric. 
This concludes the proof. □ 

Putting together Proposition 2.14 and (2.30), we obtain the inequality (2.19) of Theo- 
rem 2.4. 

Proof of inequality (2.19) in 2.4. We have 

\n N (G N J)-n oo (G N J)\ = \W 1 (G N J m )-W 1 (G N ,S f )\ 

= \W 1 (,G N ,f^)-W 1 (G N ,6 f )\ 

< CM k (f) l / k 

N7 

where we have used the definition of Oat, fioc,, the triangular inequality, Proposition 2.14 
and (2.30). □ 



3. Entropy chaos and Fisher information chaos 

In this section E C M. d stands for an open set or the adherence of a open space (so that 
the gradient of a function on E is well defined). 

3.1. Entropy chaos. The entropy of a probability measure on a compact subset of M. d 
with density f dx is well defined by the formula J /In/. On a (possibly) unbounded set 
E, we have to be more careful because the entropy may not be defined for probability 
measure decreasing too slowly at infinity. This is a well known issue, but we present here 
a rigourous definition for probability measures F E P(£ ,J ) having a finite moment M/. for 
some k > 0. It will be usefull in the section 5 where we define the level 3 entropy and 
Fisher information on P(P (E)). 

We emphasize that in the sequel we shall use the same notation F for a probability 
measure and its density F dx with respect to the Lebesgue measure, when the last quantity 
exists. For any k > and F E Pfc(-E^) n L , we define the (opposite of the Boltzmann's) 
entropy 

(3.1) Hj{F) := [ F logF 

= J E h{F/G{)G{ + J E F\ogG{ {=:Hf\F)) 

with G 3 k (V) := c? k exp(— |vi| fc — ... — \vj\ k ) E P(£ ,J ), chosen so that G{ is a probability 
measure, and h(s) := s logs — s + 1. The RHS term is well defined in MU {+00} as the 
sum of a nonnegative term and a finite real number, and it can be checked that it is equal 
to the middle term, which has thus a sense. Next, we extend the entropy functional to 
any F E ~Pk{E 3 ) by setting 

(3.2) fl>(i0:= ^p \(F,<f> j )-H*(cf> j )\ + [ FlogG{ (=:H^(F)) 
where 

H*(^):= [ h*(^)G{ 

JEi 
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and where h*(t) := e — 1 is the Legendre transform of h. Finally, we define the normalized 
entropy functional H by 

(3.3) VFe P k {E j ) H(F) := ~ Hj(F). 

We start recalling without proof a very classical result concerning the entropy. 

Lemma 3.1. Let us fix k > 0. The entropy functional P k (E) — > R U {+oo}, /? i— > Hj(p) 
is well defined by the expression (3.2), is convex and is l.s.c. for the following notion of 
converging sequences: p n — 1 /> in f/ie weafc sense o/ measures in P(E) and (p n ,\v\ m ) is 
bounded for some m > k (the same holds of course for H). Moreover, Hj(F) does not 
depend on the choice of k used in the expression (3.2), 

H(F)> log c k -M k (F) VFeP fc (£), 

and H(F) < oo iff F £ L 1 , F log F £ L l {E), and then H(F) = H^(F). 

We also recall the definition of the (non-normalized) relative entropy between two prob- 
ability measures p and rj of P(£ ,J ) : 

(3.4) Hj(p\ V ) := In (^\ dp = J(jghig + l- g)d V 

with g = J if p is absolutely continuous with respect to n. If g is not defined, then 
Hj(p\rj) := +oo. The associated normalized quantity is simply H(p\rj) := jHj(p\rj). The 
relative entropy is defined without moment assumption since the quantity under the last 
integral is nonnegative. It can also be defined using a dual formula similar to (3.2). For a 
fixed T] it has the same properties as the entropy. 

We now give two elementary and well known results which are fundamental for the 
analysis of the entropy defined on space product. 

Lemma 3.2. On P^i^), m > 0, the entropy satisfies the identity 

(3.5) V/eP m (£) H(f®i) = H(f). 

Proof of Lemma 3.2. If / e P m (E) is a function such that H(f) < oo, then we may 
use (3.1) as a definition and 

H(f^) = - [ /®''log/®'= /" f^( Vl ,..., Vj ) log f(v 1 ) = H 1 (f). 

In the contrary, H\(f) = oo implies Hj{f® 3 ) = oo. □ 
Lemma 3.3. (i) For any functions f,g£ L^^E) n P(E), m > 0, there holds 

(3.6) H(f):= [ /log/> / /log g, or H{f\g) := [ f log{f/g) > 0, 

Je je je 

with equality only if f = g a.e.. 

(ii) More generally, for any nonnegative functions f,g£ L^E), m > 0, there holds 

[ /log^>Flog^, with F:= f f, G:= [ g. 
Je 9 G J E J e 

(Hi) A consequence of (i) is that if F G ~P{E 3 ) has first marginal f with H{f) < +oo, then 

H(F) > H(f) with equality only if F = f® j a.e.. 
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(iv) The entropy is superadditive: for any F G P m (E t+J ) n P S ym (E t+J ) , i, j G W , m > 0, 
the following inequality holds 

(3.7) Hi + j{Fi + j) > Hi(Fi) + Hj(Fj), (non-normalized entropy), 

where Fi as usual stands for the l-th marginal of F. 

Proof of Lemma 3.3. (i) To obtain the inequality, write H(f\g) = j h(f/g)f and use 
the fact that h(s) = slogs — s + 1 is a nonnegative function. Next there is equality only 
if h(f jg) = a.e. on {/ > 0}. Since h vanishes only at s = 1, it means that f = g a.e. on 
{/ > 0}. Using that J f = j g = 1, we obtain the claimed equality. 

(ii) We write 

/ f\og f - = F ( //Flog^+ / /log£, 
Je 9 Je 9/G J e G 

the first term is nonnegative thanks to (3.6) and the second term is the one which appears 

on the RHS of the claimed inequality. 

(iii) We use the first inequality (3.6) on E J with F and 



H(F) = -f FlogF>- / Flogf®i=[ F(V) log f(yi) dV = H(f). 

J JEi J J Ei J Eo 

Using again the point i), we see that equality can occur only if F = f® 3 a.e.. 
(iv) Denote hi := iJg(i^). If hi+j = +oo there is nothing to prove. Otherwise, we have 
hi+j < oo which in turn implies F G L 1 ^^), then Fi G ^(FJ 1 ), Fj G L l {E 3 ), so that 
the entropy may be defined thanks to (3.1). In EU {— oo}, we compute 

hi+j — hi — hj = / Fi + j log Fi + j 

J E l +i 

- F i+j logFi(vi,..,Vi) - F i+j log Fj(v i+1 ,..,v i+ j) 

J E l +i J E i+ i 

= ( F i+j log F i+j - j F i+j log Fi 8) Fj > 0, 

thanks to (3.6). □ 
Our first result shows that entropy chaos is a stronger notion than Kac's chaos. 

Theorem 3.4 (Entropy and chaos). Consider (G ) a sequence of P sym (E N ) such that 
(G± , \v\ m ) < a for any N > 1 and for some fixed m, a > and consider f G P(E). 
1) If G^ — 1 Fj weakly in P(E J ) for some given j > 1, then 

(3.8) H(Fj) <limmfH(G N ). 

In particular, when (G ) is f -Kac's chaotic, (3.8) holds for any j > 1 with Fj := f® 3 . 
2) On the other way round, if (G N ) is f -entropy chaotic, then [G ) is f -Kac's chaotic. 

Proof of Theorem 3.4. Step 1. For any N > j we introduce the Euclidean 
decomposition N = n j + r, < r < j — 1, exactly as in the proof of Proposition 2.6. 
Iterating n times the superadditivity inequality (3.7) we have 

H N (F N )>nH J (F J N ) + H(Ff), 

with the convention H(F^) = when r = 0. We get (3.8) by passing to the limit in 
that inequality divided by N, using that H is l.s.c. and that H(F^) is bounded by below 
thanks to Lemma 3.1 and the condition on the moment. 
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Step 2. We assume that (G ) is /-entropy chaotic, that is 



have 



Gf ->> / weakly in P(E) and H(G n ) -> #(/) < oo. 

Let us fix j > 1. The sequence (G^ ) being bounded in P m (E^), there exists Fj G P(£^) 
and a subsequence (G N ') such that G^' — Fj weakly in P(E^). Thanks to step 1, we 

ii 

Since the first marginal of Fj is (Fj)\ = limjv^ +00 Gi = /, the third point of Lemma 3.3 
gives that Fj = /® J a.e.. As a conclusion and because we have identified the limit, we 

□ 



H(Fj) < liminf H(Gf ) < liminf H(G 



H(f) = H(f 



have proved that the all sequence (Gf) weakly converges to 



3.2. Fisher chaos. We now establish similar results for the Fisher information func- 
tional. For an arbitrary probability measure G G P{E 3 ), we define the normalized Fisher 
information by 

|VG| 2 



(3.9) 



lf\G) :-- 




[ |VlnG| 2 G G MU{+oo} if G G W 1 ' 1 (E^), 

J Ei 

if G i W^{EP), 



For G G P{E 3 ), we also give an alternative definition 

H 2 



(3.10) 



lf\G) := sup (G 



div^) G lU{+oo}. 



Lemma 3.5. For all j G N, the identity 1^ = ij holds on P(E^), and we simply denoted 
by Ij the usual (non-normalized) Fisher information and by I = j -1 Ij the normalized 
Fisher information. The functionals Ij and I are proper, convex, l.s.c. (in the sense of 
the weak convergence of measures) on P(E J ). 

Proof of Lemma 3.5. For the sake of simplicity, we only deal with the case j = 1. We 
split the proof into two steps. 

Step 1. Assume that / G W 1 ' 1 . Since for all ip G C£{E) d 



|Vln/| 2 - Vln/-V + 



Vln/ 



> o, 



we have 



/ (1) (/) = /jVln/| 2 /> jf (vin/^-K) /. 



For any sequence {tf) n ) of smooth functions approximating 2Vln/ = 2^£, we obtain 
that 



/ (1) (/) 



(3.11) 



lim 

n— >oo 



sup 



sup 

iP£Cl(E) d JE 



Vln/-Vn 
V In / • V - 

V/ • 1> - f 



f 



4 

|Vf' 
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The remaining equality I^ 3 ' = 1^ is just a simple integration by parts. Remark that 
maximizing sequences (if) n ) must converge (up to some subsequence) pointwise to 2 V In / 
a.e. on {/ ^ 0}. We shall use that point in the sequel. 

We also remark that this reformulation 1^ is also exactly the one obtained when using 
the general Fenchel-Moreau theorem on the convex function (a, b) — > ^j- (which is used in 
the integral defining 1^). 

Step 2. It remains to check that the equality 1^ = 1^ is also true on P(E)\W 1 ' 1 (E). 
In other words that if / ^ W 1 ' 1 (E) then I^ 2 \f) = +oo. In what follows, we prove 
the contraposition : I^ 2 \f) < +oo implies / £ W 1 ' 1 (E). Once it will be done, we will 
have = /( 2 ) everywhere, from what follows that / is l.s.c. in the sense of the weak 
convergence of measures. 

Consider / £ P(£?) and assume J' 2 '(/) < oo. We deduce that for any ip £ Cl{E) d and 
any t £R 

f[-t 2 ^-tdiv^}<I^(f), 
so that by optimizing in t £ K and using that / £ ~P(E), we get 



/ £ /div^| 2 <4/( 2 )(/) Jj^-<I^{f) 



2 



That inequality implies / £ BV(E) and ||V/|| T y < V l{2 Kf)- Usin g that / G £V(E) and 
making an integration by part in the definition of I^ 2 \f), we find 



/ (2) (/) 



sup 



/ (3) (/)- 



Now, for any compact subset K C E with zero Lebesgue measure, we may find a sequence 



p e £ Cc(E) such that < p e < 1, p £ = 1 on K and /9 e 
using that / £ BV(E) C L l {E), we get for all e > 



a.e., so that for any t > and 



* / |V/|<t / |V/|, 

JK JE 



< sup 

il>eCl(E)*M\\<x><i J E 



V/ • tip Ps 



ft 2 



+ 



< sup 

^6Ci(E) d ,Woo<l^-E 

</<"(/> 4//* 

Passing to the limit s — > using that / £ ^(K) and then i — > oo, we deduce that V/ 
vanishes on K, which precisely means that V/ is a measurable function. We have proved 
/ £ W 1 ' 1 ^). □ 
Similarly, we define for two measures p and rj on E J their (non-normalized) relative 
Fisher information I(p\rj) by 



(3.12) 



dry 



Vln^ 
arj 



dp, 



where g = ^ if p is absolutely continuous with respect to 77. If not, Ij{p\rf) := +00. 
The associated normalized quantity is simply I(p\rj) := ilj(p\rj). For a fixed rj, the 
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relative Fisher information has roughly the same properties as the Fisher information. In 
particular, if rj has a derivable density, we have the equality 



(3.13) Ij{p\v) = sup f (-<p • — - divip - ^j-] dp. 



/ ( — tp ■ div ip ; 

Lemma 3.6. For any f G P(F) there holds I(f® j ) = 1(f). 



Proof of Lemma 3.6. If /(/) < oo then / G W X >\E) and also f®i G W 1 ' 1 (E^). The 
following computation is then meaningful 

1 f \V EJ f®J\ 2 f \V E f\ 2 n 



J J f J Ei f 

Since Ij(f® j ) < oo implies f®i G W 1 ' 1 (E^) and then / G VF 1 ' 1 ^), we also have Ij(f® j ) = 
jl(f) if J(/) = oo. ' □ 

Lemma 3.7. For any F G P S y m (E :) ) and 1 < £ < j, then holds 

(i) I(F e )<I(F). 

(ii) The Fisher information is super-additive. It means that 

(3.14) — Il{.Fg) + (non-normalized Fisher information), 

with in the case Ie(Fg) + F,-_^(Fj_^) < +oo equality only if F = Ft® Fj_g. 
(Hi) IfI(F 1 ) < +oo, the equality I(Fi) = 1(F) holds if and only if F = (Fi)® j . 

Proof of Lemma 3.7. 

Proof of (i). If 1(F) = +oo the conclusion is clear. Otherwise, thanks to the equivalent 
definition 1^ of the Fisher information and the symmetry assumption of F, we have 



1(F) = sup - / U( Xl ,...,x 3 )-VF-F mXu -- X ^ 

ipeC b (Ei)4i J JEi v 4 

- sup / (^,...,,,).V,F-F W *-"'*' )I ' 

ipEC b (Ei) d JEi v 1 

/ f^(xi,...,x/).ViF-F 



xi, . . -,xi)\ 2 



> sup 

4>£C b (E e ) d JEi 

sup I U.VxF t -Fi^p)=I{F t ). 

^C b (E e ) d JE^ 4 y 

Proof of the superadditivity property (ii). The first proof of that result seems to be the 
one by Carlen in [16, Theorem 3]. We sketch now another proof that uses the third 
formulation I^ 3 \ We recall that in the definition of Ij 3 \f) the supremum is taken over 

the ip = (ip\, . . . ,tpj), with all ipi : F J — > M. d . We now restrict the supremum over the tp 
such that: 

- The £ first ipi depend only on (x\, . . . , xe), with the notation ip e = (ipi, . . . , ipe). 

- The (j—£) last ipi depend only on (xe + i, . . . , Xj), with the notation rp^~^ = (ip£+i, ■ ■ ■ , ipj). 
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We then have the inequality 

|^|2 + |^i-l|2 

\ ,'./ • < ' + \ ,-(./'< - -J 



IjiF) > sup / Wif-tf+Vj-tf-ip-'-f- 
sup / [Vh-^-h 

ip e £C}(E e ) td J E l 1 



■ W~ l \ 2 

+ sup / [Vfj^-^-t-fj^-V—L} 



ipj- t £Cl{Ei- i )(i- t ) d JEi- e 

= I e (F t ) + Ij-AFj-t) 

If the inequality is an equality, we use the remark made at the end of Step 1 in the proof 
of Lemma 3.5 : Maximizing sequences ip^ and ip^T^ for respectively In (resp. Ij—i) should 
converge pointwise towards 2Vln/; (resp. 2Vln/j_£) up to some subsequence, a.e. on 
{fa 7^ 0} (resp. {fj-e 7^ 0}). If we have equality, we also must have (^^tp^ 1 ) — > 2Vln/ 
on {/ 7^ 0}, a set that is included in {fy 7^ 0} x {fj—i 7^ 0} and thus 

V In / = (V In f h V In fj_ t ) = V ln(f e ® /,_,), 

which implies the claimed equality since / and fi <S> fj-t are probability measures. 

The case of equality (Hi). Using recursively the superadditivity in that particular case, we 
get with the notation F\ = f 

Hf) = i(f) > J -^m-!) + -i(f) > J —^i(Fj- 2 ) + -i(f) >.■■> nf). 

Therefore, all the inequalities are equalities. We obtain that 

F = Fj-i f = Fj_2 ® / <8> / = . . . = f® j , 
by applying recursively the case of equality in (3.14). □ 

It is classical and essentially a consequence of the Sobolev inequality and the Rellich- 
Kondrachov Theorem (together with very standard manipulations on the entropy func- 
tional which are similar to the ones presented at the end of the proof of Theorem 4.13) 
that for (f n ) a sequence of P(E), the conditions 

f n -» / weakly in P(E), M k (f n ) bounded, k > 0, and /(/„) < C 

imply that H(f n ) —> H(f). A natural question is whether a similar result holds for a 
sequence (F N ) in P(E N ). Before answering affirmatively to that question, we establish a 
normalized non-relative HWI inequality for a large class of sets E C M. d . It is a variant 
of the famous HWI inequality of Otto-Villani [61] that will be the cornerstone of the 
argument. Let us mention that its good behaviour in any dimension is of particular 
importance here and it is due to the good (separate) behaviours of H, W2 and I with 
respect to the dimension. 

Proposition 3.8. Assume that E C K d is a bi-Lipschitz volume preserving deformation 
of a convex set ofM. d , d > 1; there exists a convex subset E\ C M. d and a bi-lipschitz 
diffeomorphism T : E\ — > E which preserves the volume (i.e. its Jacobian is always equal 
to 1). Then, the normalized non relative HWI inequality holds in E: there exists a constant 
Ce £ [l,oo) such that 

(3.15) VF N , G N G P 2 (E N ) H(F N ) < H(G N ) + C E W 2 (F N , G N ) ^I(F N ). 
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More precisely, the above inequality holds with Ce '■= HVTHoo ||VT 1 || 00 where 
HVTlloo := sup veE sup^i^i \VT(v)h\ 2 . 

Before going to the proof, remark that the class of set E which are bi-Lipschitz volume 
preserving deformation of convex set is rather large. For instance, it is shown in [31, 
Theorem 5.4] that any star-shaped bounded domain with Lipschitz boundary (and some 
additional assumptions) is in the previously mentioned class. 

Proof of Proposition 3.8. We proceed in three steps. 

Step 1. E = M. d . Let us first recall the famous HWI inequality of Otto-Villani. Consider 
p = e~ v ^ dx a probability measure on M. D such that D 2 V > 0. For any probability 
measures 

/o,/i S P 2 (K D ), there holds 



(3.16) H D (f \p) < H D (f 1 \p)+W 2 (f ,h) VWolp), 

where Hp and Id stand for the non normalized relative entropy and relative Fisher in- 
formation defined in (3.4) and (3.12) respectively, and W 2 stands for the non normalized 
quadratic MKW distance in MP based on the usual Euclidean norm \ V\ = (X^iLi l^il 2 ) 1 ^ 2 
for any V = (vi, vd) £ Inequality (3.16) has been proved in [61], see also 

[72, 73, 60, 9, 23]. We easily deduce the "non relative" inequality (3.15) from the "relative" 
inequality (3.16). In order to do so, we simply apply the HWI inequality (3.16) in M D , 
D = dN, with respect to the Gaussian j\(v) := (2irA)~ D / 2 e~\ v \ / 2A , and we get 



H D (F N \<y x ) < H D (G N \ 7x ) + W 2 (F N ,G N )^I D (F"\ 7x ). 

We write the relative entropy and the relative Fisher information in terms of the non- 
relative ones, and we get 



H D (F N \ 1X ) = H D (F N ) - J F N ln( 7A ) = H D (F N ) + ^ log(2^A) + ^ 



2A 



Id(F n \ 1x ) = F N VlnF" + {- =Ib(/o) + T / wV/o + 



V 



2 f , M 2 (/ ) 



A 

2D M 2 (/ ) 
Id{Jo) 7- + 



X A 2 



A A 2 ' 

Inserting this in the relative HWI inequality, simplifying the terms involving log(2-7rA), 
letting A — > +oo and dividing the resulting limit by N, we obtain the claimed result. 

Step 2. E C M d is convex. The proof is the same as in the case E = M rf using that the 
HWI inequality (3.16) holds in a convex set. We have no precise reference for that last 
result but all the necessary arguments can be find in [73]. More precisely, [73, Chapter 20] 
explains that the HWI inequality (3.16) holds when the entropy is displacement convex, 
while it is proved in [73, Chapters 16 and 17] that the entropy on a convex set E is 
displacement convex, exactly as on M. d . 

Step 3. General case. We choose two absolutely continuous probability measures F N 
and G N on E , and defined the corresponding probability measures Ff and Gf on Ef 
by 

Ff( Vl , ...,v N ):= F N (T(vi), T(v N )) = F N o T® N (V), 
and the same formula for G± . It can be checked that \7 V .F^ = t \/T(vj)V Vj F N o T® N , so 
that [V^.-F^l < HVTHoo \\7 Vj F N o T® N \. Turning to Fisher information, it comes 

f \\7F N \' 2 r \X7F N nT® N \ 2 



N Fi N ~ " "°° f n F N o T® N 

i J- E i 
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where we have used the fact that T preserves the volume. 

For the MKW distance, remark that \(T~ X )® N (V)-(T- X )® N (V')\ < WVT' 1 ^ \V-V'\. 
Therefore, 



W 2 (F( f ,Gf) 2 = inf I ' \V -V'\ 2 K X (dV,aV') 



inf / \(T~ l )® N (V) - (T~ l )® N (V')\ 2 Tt(dV,dV') 
< II VT- 1 !!^ inf / \V-V'\ir(dV,dV') 

= iivt- 1 !!^^,^) 2 . 

For the entropy, the preservation of volume ensures the equality H(Ff) = H(F N ), and 
a similar one for G N . Finally, using the HWI inequality in E\ proved in step 2 and the 
above properties, we get 



H(F N ) = H(F?)<H(G?) + Jl(F 1 N )W 2 (F 1 N ,G?) 



< H(G N ) + HVTHoo HVT-^U y I(F N ) W 2 (F N , G N ), 

which is exactly the claimed result. □ 

Let us finally prove now our main result Theorem 1.4 which is a consequence of the 
characterization of the Kac's chaos in Theorem 2.4 together with Proposition 3.8. 

Proof of Theorem 1.4. We recall that the implication (iii) =^ (iv) has been yet proven 
in Theorem 3.4. We split the proof into two steps. 

Step 1. (i) =^ (ii). Fix a j G N, there exists a subsequence of (G ), still denoted by 
{G N ), and some compatible and symmetric probability measures Fj £ P(_E-?), such that 
G^ -> Fj weakly in P(E j ). In particular Fi = f. As a consequence of Lemma 3.5 and 
Lemma 3.7 point (i), we have 

/(/) < I(Fj) < hminf I(Gf ) < lim inf I(G N ) = 1(f). 

Using now the third point of Lemma 3.7 we deduce Fj = f® J . The uniqueness of the limit 
implies that the whole sequence G N is in fact /-Kac's chaotic. 

Step 2. (ii) =^ (iii). We write twice the normalized non relative HWI inequality of 
Proposition 3.8, and get 



\H(G N ) - H(f® N )\ < C E W 2 (G n , f® N ) (yJl(G N ) + ^I(f® N ) 
Using the previous inequalities together with the inequality of the Lemma 2.2 
W 2 (G N ,f® N )<C E 2l [M fc (Gf) + M fc (/)] 1 ^ W / 1 (G Ar ,0 1 /2-iA 
we get (1.8) since M k (f) < sup M k (G^) and /(/) < supI(G N ) . □ 
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4. Probability measures on the "Kac's spheres" 

We generalize the preceding two sections to the important case of probability measures 
with support on the "Kac's spheres" 

ICS N := {V = (ui, ...,v N ) G R N , v\ + ... + v% = N}. 

We refer to [19] where similar results are obtained to the (even more important) case of 
probability measures with support on the "Boltzmann's spheres" 

BS N := {V = (vi, ...,v N ) G (R 3 ) N , \ Vl \ 2 + ... + \v N \ 2 = N, v\ + ... + v N = 0}. 



4.1. On uniform probability measures on the Kac's spheres as N — > oo. 

Definition 4.1. For any N G N* and r > 0, we denote by a N,r the uniform probability 
measure ofR N carried by the sphere S^ 1 defined by 

S?- 1 := {V g R N ; |Vf = r 2 }. 

We define a N G P(E N ), E = M., the sequence a N := o- N '^ of probability measures 
uniform on the Kac's spheres 

KS N := S^ 1 := {V G R N ; \V\ 2 = N}. 

We begin with a classical and elementary lemma that we will use several times in the 
sequel. 

Lemma 4.2. (i) For any 1 < I < N — 1, there holds 

| T /|2. N-l-2 \qN— I— ll 



n J+ iy^/ 2 |5f _1 |' 

w/iere we reca// i/iat | .S'f —1 1 = 2 7r fc / 2 /T(£;/2). 

(hJ For any fixed t, the sequence {cff)N>N l is bounded in L°° (with Ng = £ + &), in H s 
for any s > (with = N(i,k) large enough) and the exponential moment M 21 /4( 
defined in (2.27) is bounded (uniformly in N). 

(in) For any function ip G Cb(M. ), any r > and 1 < £ < N — 1, there holds 



a 



am 



N-e-i 



where FgR and 1/' G M^ - ^. XTiis precisely means that 

o- N (dV,dV>) = aUdV)a N v ± W2 {dV'). 



dV, 



Proof of Lemma 4.2. (i) One possible definition of a N ' r is 

where the surface r N_1 [S 1 ^ -1 ! of the Sphere S^ -1 stands for the normalization constant 
such that a N ' r is a probability measure. For any tp G Cb(E e ), 1 < £ < N — 1, we compute 

1b( p )> ^ ® 1^) = jf^ Vl 2 <p 2 <P(V) {J RN _ e 1 * 2 e+1 +-+x%<p 2 -\v\ 2 dx e+i - ds Jv| 

^y)^(p 2 -|y| 2 )I^dl/, 
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where uj k = |i? fc (l)| is the volume of the unit ball of We deduce 



1 d 

ZN r dr 



N-t /J2 



(r~\V\ z ). 



uj n -" {N-t) 



r N-l IgN- 



ifr(r d -\V\*)_ 



We conclude using the relation \S^ 1 \ = ktu k . 



(ii) The estimates on af are deduced from its explicit expression after some tedious 
but easy calculations. We only prove the last one which will be a key argument in the 
proof of the accurate rate of chaoticity in Theorem 1.5. For any k > 1 and introducing 
n := (N — 4)/2, we easily estimate 



/ 



Vl \ 2k a?(dv) = - 



< 



1 N-2 



2vr iV 



I |2 JV-4 







2 JV-4 



N k+1 J s k ( I - .s ) ds. 



Thanks to k + 1 integrations by parts, we deduce 



|«ir<7f(d«) < N k+L / (1 - zf z n dz 

Jo 



N 



k+l 



k 



n + l 



(1 - z) k ~ l z n+1 dv 



N- 



k+l 



k 



1 



1 



n+l n + k — In + kn + k + l 



and then 



/ e^^af(v)dv < f) * [ H^Cefo) 

^ 1 (2n + 4) fe+1 
Jn + T) 



k=0 

oo 



+ l)...(n + fc + l) 



1 (n + 2) 
< 2y-r ) £ < 6. 

fc=o v ; 
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(iii) We come back to the proof of (i). We set m = I and n = N — £ and we write 



lim — 



lim — 



tp- / 99 

B N (r+h) JB N {r) 



Zn r h—>o h 

1 I 

+— — lim - 

Z7V r h— >0 tl 



.<p 



\v\<r+h J\v'\<^J(r+h) 2 -\v\ 2 J\v\<r J\v'\<y/{r+h) 2 -\v\ 2 



, . v- / / , <P 

\v\<rJ\v'\<s/(r+h) 2 -\v\ 2 J\v\<r J \v' \<y/ r 2 -\v\ 2 



lim — 



Zn r h-*o h 



-h J\v> 



r<\v\<r+h J\v'\<y/(r+h) 2 -\v\ 2 



+ - 



1 



lim 



1 



V- / , <P 

v\ 2 ) J B n {yfr 2 — \v\ 2 ) 



Zn r I Rm h^fO tl 

We invert the integral and the limit on the last line using dominated convergence, since 
the integral on v' are bounded by ||y||oo/\A' 2 — |t?| 2 - The first term is bounded (for any 
< h < r) by 



lim — 



/ / 

Jr<\v\<r+h J\v'\ 



\<f \ < Cn,t II^IIl 00 lim Vh = 0, 



Zjs! iT h-^Q h J r <\v\<r+h J\v'\<V3rh " 



and the second term converges to 



B m (r) ^m+rijr 




dv, 



which is exactly the claimed identity. 

Let us recall the following classical result. 

Theorem 4.3. The sequence a N is ^-chaotic, where 7 still stands for the gaussian dis 
tribution 7 (da;) = (2-7r) _1//2 e~ x 7/2 dx on R, and more precisely 

e + 3 



□ 



(4.1) 



N 



7 



-1 < 



pour tout 1 < £ < N - 4. 



iV-^-3 

The fact that a N is 7-chaotic is sometime called "Poincare's Lemma". In fact, it should 
go back to Mehler [51] in 1866. Anyway, we refer to [25, 17] for a bibliographic discussion 
about this important result, and to [25] for a proof of estimate (4.1). We give now a 
different quantitative version of the "Poincare's Lemma" . 

Theorem 4.4. There exists a numerical constant C € (0, 00) such that 

C 



(4.2) 



N 



Remark 4.5. It is worth observing that it is not clear that one can deduce (4.2) from 
(4.1) or that the reverse implication holds. In particular, using (4.1) and Theorem 2.4 we 
obtain an estimate on W\(a N , j® N ) which is weaker than (4.2). 

Proof of Theorem 4.4. There is a simple transport map from j® N onto a N which 
is given by the radial projection P : V i-» tXi- with the notation \V\k = {N~ l 
for any k > for the normalized distance of order k. The fact it is an admissible map 
comes from the invariance by rotation of 7®^ and a N . Is it optimal? It is not obvious 
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because P(V) is not necessary the point of )CSn wich is the closest to V £ Mr, for the | • |i 
distance (for which it costs less to displace in the direction of the axis). However, it may 
still be optimal for rotationnal symmetry reasons, but it is less obvious. Nevertheless, it 
will be sufficient for our estimate. Since, 



\v\ a 



|p(v)-y|x 

we get as all our distances are normalized 

^1(7^/) < [ \p(y) - v\^® N (dv) 



l 



IVI 



I V|a 
+00 



V\^® N (dV) 







R 



R N e- R2 ' 2 dR 



\S 



N-l\ 



(2tt) n / 2 



\V\ida 



NA 



Using that |V|i < | V 1 2 because of the normalization, we may bound the last integral by 



\VUda N < 



\V\ 2 da N =N~ 1 / 2 . 



Remark that this integral is also equal to -^=Mi{a N ) which can be explicited thanks to 
the formula for of Lemma 4.2. Using this in the previous inequality and performing 
the change of variable R = yNR', we get 



15 



JV-li 



N(2tt) n / 2 J 



N-R\R N - l e- R2 / 2 dR 



I oN—l I 7\r# r+00 

We can simplify the prefactor, using the formula for IS 1 ^ -1 ) and Stirling's formula 



(2tt) n / 2 



r(f )2 JV / 2 - 1 

yfNe N ' 2 



7T 



Turning back to the transportation cost, we get 

W 1 { 1 ® N ,a N )< e -^-[l + 0{l/N)] f°° 
^ Jo 



1 + OQ./N)]. 



l-R\ dR. 



After studying the function g(r) = re^' r ' 2 ^ 2 , we remark that it is strictly increasing form 
to 1, then strictly decreasing from 1 to +00, that its maximum in 1 is 1, and that 
g(l + e) = 1 — e 2 + 0(e 3 ). We shall also use the less sharp but exact bound 



0(1 + e) < 1 



for ee [~ V2-1]. 
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We can now cut the previous integral in three parts / + / + / . We bound the 
first part by 

»l/2 



Jl/2 J\f2 



s 1 (\\ N - X 



and the third part by 

»+oo r+oo 



f+oo r+oo 
/ . . . < g{ V2)"- 1 / e~ r2/2 r dr = g{V2) 

Js/2 Js/2 



N - l e- 1 . 



>V2 Jy/2 

For the last part, we perform the change of variable r = 1 + u/y/N. It comes 

1/2 W-v/TV/2 VA?; 11 

1 r(V2~l)VN , 2 \N-1 



1 



+oo 



, 2 



< — (^©(iV^ 1 )) / e~-\u\du 

< ^(1 + 0(^-1)) 



Putting all together, we finally get 



WiCt®*,^) < -^={i + o{n~ 1 )) + cVn\ n ', 

v 



with A = max((5r(\/2), 5(1/2)) < 0.86. This implies the claimed inequality. □ 

Proof of (1.9) in Theorem 1.5. The proof of the last estimate in (1.9) follows from 

(4.2) and Lemma 4.2-(ii) together with (2.29). □ 

4.2. Conditioned tensor products on the Kac's spheres. We begin with a sharp 
version of the local central limit theorem (local CLT) or Berry-Esseen type theorem which 
will be the cornerstone argument in this section. 

Theorem 4.6. Consider g £ P 3 (IR D ) n L P (R D ), p G (l,oo], such that 

(4.3) / xg(x)dx = 0, / x <8> x g{x) dx = Id, / |x| 3 g(x) dx =: M3. 



We define the iterated and renormalized convolution by 

(4.4) g N {x) := VN g^* N \VN x). 

There exists an integer N{p) and a constant Cbe = C(p,k, M^(g), ||<?||lp) such that 

(4.5) VN>N(p) \\9n-i\\l<*>< CbL 



N 

Remark 4.7. Theorem 4-6 is a sharper but less general version of [17, Proposition 26]. 
The proof follows the proof of [17, Proposition 26] and uses an argument from [17, Propo- 
sition 26], see also [45]. The first local CLT have been established in the pioneer works 
by A. C. Berry [7] and C.-G. Esseen [29] who proved the convergence in 0(l/yN) uni- 
formly on the distribution fonction in dimension D = 1, see for instance [30, Theorem 
5.1, Chapter XVI]. Since that time, many variants of the local CLT have been established 
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corresponding to different regularity assumption made on the probability measure g, we 
refer the interested reader to the recent works [63], [8], [3] and the references therein. 

The proof of Theorem 4.6 use the following technical lemma which proof is postponed 
after the proof of the Theorem. 



Lemma 4.8. (i) Consider g G P3 



satisfying (4.3). There exists 5 G (0,1) such that 



V£G B(0,6) \g(0\ <e"l«l 2 / 4 . 

(ii) Consider g G P(M £I ) Pi L P (M. D ), p G (l,oo]. For any 5 > £/iere exists k 
K,(M^(g) : \\g\\iP ,5) G (0,1) suc/i i/iai 



(4.6) 



sup < 

I5I>5 



Proof of Theorem 4.6. We follow closely the proof of [17, Theorem 27] which is more 
general but less precise, and we use a trick that we found in the proof of [34, Theorem 1]. 
We observe that 



9n(0 = m/VN)) 



7(0 = mNN)f. 



Because g G ^CiL?, the Hausdorff- Young inequality implies g G LP nL°° with p 1 G [1, 00), 
and then c/n(0 = (<?(£/ v^V))^ G L 1 for any N > As a consequence we may write 



\g N (x) - j(x)\ = (2tt) 



d 



(g N (0-j(0)e l t- x dt 



< (2ir) D [ \g N - 7 | d£. 



We split the above integral between low and high frequencies 



|5 , at-7||l 00 < 



For the first term, we have 



\9n\ d£ 



\Z\>VN8 



ItI dt 



+ 



\Z\<VN5 



\9N~l\di (=:T 1 +T 2 +T 3 ). 



< 



< 



\Z\>VnS 



sup \g(rj)\ 
\v\>6 



N 



N 



dt = N d ' 2 [ \g(v)\ N dr, 

J\n\>5 



N d/2 



T)>5 



'\v\l 
\g{rj)\ p drj 



< k(5) n -p' N d ' 2 C v \\g\\l 



LP 



with 5 G (0, 1) given by point (i) of Lemma 4.8, k(5) given by point (ii) of Lemma 4.8 and 
N > p'. The second term may be estimated in the same way, and we clearly obtain that 
there exists a constant C\ = C±(D,p, ||<?||lp) such that 



(4.7) Ti + T 2 < 

Concerning the third term, we write 

T 3 = 



Ci 



N 



9n(0 -7at(C)I i, 



\i\<VN5 
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with 



\9n(€) -7jv(Q| 
l£l 3 



,JV 



AT3/2 



1 



ie/^i 3 



N 3 / 2 \Z/VN\ 3 
JV-1 

£ 5(c/^v) fc 7(e/Viv) 



JV-fc-1 



fc=0 



Estimate (i) of Lemma 4.8 implies 

JV-1 



s JV-fc-1 



fc=0 



fc=0 



We deduce 



1 



jV 3 / 2 

1 



sup 

»7 



e s 



^ ^(M 3 (5)+M 3 (7))C7 M . 

We conclude by gathering the estimates on each term. 
Proof of Lemma 4.8. Thanks to a Taylor expansion, we have 

e 

2 

I 
4 

from which we deduce that there exists 5 = 5{M^{g)) G (0, 1) small enough such that 



□ 



m 



+ o(iei 3 ), 



UJ{X) 



1 _„2 

e 

7T 



That is nothing but (i). On the other hand, (ii) is a consequence of [17, Proposition 26, 
(hi)]. □ 

For a given "smooth enough" probability measure / G P = 



^jv(r) := 



5^-i(r) 



f® N da N > r , 



Z' N (r) :-- 



f 



S N ~ 1 (r) 7 



<g>JV 



f/<7 



JV,r 



we dehne 

_ Z N (r) 
y»JV( r ) • 



We give a sharp estimate on the asymptotic behavior of Z' N as N — > oo. 
Theorem 4.9. Consider f G Pe(R) H L P (M), p G (1, oo], satisfying 



(4.8) 

and define 
(4.9) 



f v dv = 0, 



E 



f\v\ dv, 



{v 2 - E) 2 f(v) dv 



1/2 
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Then Z^{r), Z' N {r) are well defined for all r > and there holds with the above notations 

y/2 ( \ (r 2 -NE\ 2 \ R N (r 



(4.10) Z ' N (r)a N (r 2 ) = ^a N (N) ( exp - ( r - ^ ) 1 + 



r 




where 



N 

a N (s) = s" 1 e-^ and \\Rn\Ioo < C(p, ||/|| p , M 6 (/)) 
j4s a particular case, there holds 

(4.11) := Z' N (VEN) = ^ (l + . 

Proof of Theorem 4.9. We follow the proof of [17, Theorem 14] but using the sharper 
estimate proved in Theorem 4.6 (instead of [17, Theorem 27]). 

Before going on, let us remark that it is not obvious that Z^{f]r) is well defined for 
all r > under our assumption on / which is not necessarily continuous, since we are 
restricting f®" to surfaces of R . But, in fact the product structure of f® makes it 
possible. To see this, take / and g two measurable functions equal almost everywhere, 
and call M the negligible set on which they differ. Then the tensor products f® N and 
g® N differs only on the negligible set M = U< R®^ 1 ) x TV x R^( N ~ i ). It is not difficult 
to see that because of the particular structure of Af, the intersection of M D S^~ l is also 
cr^y -negligible for all r > 0. Therefore f® N and g® N are equal -almost everywhere on 
S^ -1 , and there is no ambiguity in the definition of Z]\r(f,r) for all r > 0. 

We now define the law g of v 2 under / 

(4.12) h(u) := -±-= (/(v^) + f(-Vu)) l«>o, 

2 y/U 

remarking that h £ f] L q (M) with q > 1 as it has been shown in the proof of [17, 

Theorem 14]. Consider (Vj) a sequence of random variables which is i.i.d. according to /. 
On the one hand, the law S]\r(du) of the random variable 

S N :=J2\Vj\ 2 

3=1 

can be computed by writing 

EMSs)) = r^l^f-V^f / f® N (V)a N > r (dV))dr 

JO VS^" 1 ' 

N ~ 1 ( f AT »% AT . AT/ . T ,s \ 



2v^' 



which implies 



Sjv (^) = i|Sf- 1 |n f " 1 ^(v^). 



On the other hand, we have sn = h^* N \ Gathering these two identities, we get 



1 tt n / 2 P~r 2 /2 

h W(?) = h S ^\r N - 2 Z N {r) = ^—-r N - 2 Z' N (r 



(4.13) 



2 1 1 1 JVW Y{N/2) JVV 1 (2ir) N / 2 

«A(r 2 ) Z' N (r) 
T(N/2) 2 N I 2 ' 



CHAOS 

Let us define g(u) := E h(E + Sti), so that g G P3 
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n L 9 (M) and 



g(y)ydy = o, 



|y| 2 = 1. 



Applying Theorem 4.6 to 5 and using the identity g^* N ^(u 
obtain 

- 2 — N E 



(4.14) 



sup 

r>0 



h {*N),2 



1 



iVE 



7 



iVE 



< 



= Hh(* N \NE+ En), we 
iVE ' 



where C_b_b is the constant given in Theorem 4.6 and associated to g. Gathering the 
Stirling formula 

-1/2- 



(4.15) r(JV/2) = \/^riVa iV (iV) 2" +1 ( 1 + C(iV 

with (4.13), (4.14), we obtain 



Vr > 



a N (r 2 )Z' N (r) 



1 



/^Vajv(Af) 2 (1 + 0(JV-V2)) 
Estimate (4.10) readily follows. 



iV E V27T 



exp 




< 



Cbe 
NT, ' 

□ 



For a given / 6 Pe(^) H L P (M), p > 1, we define the corresponding sequence of 
"conditioned product measures" (according to the Kac's spheres /C«Sjv), we write := 



?N 



1 



(4.16) 

z N (f;VN) 

We show that (-F ) is well defined for N large enough and is /-chaotic. 
Theorem 4.10. Consider f £ PeQR) n L P (M), p > 1, satisfying 



(4.17) 



f vdv = 



and 



fv 2 dv = 1. 



77ie sequence (F N ) of corresponding conditioned product measure is f -chaotic, more pre- 
cisely 



1 



O^,/) := Wi(*i" ,/^) < - \\F e N - f 



< 



Ci 



for some constant C = C(f) 6 (0, 00). 

Remark 4.11. The f -Kac's chaoticity property of the sequence F N = [/® ]ks n is stated 
and proved for smooth densities f in the seminal article by M. Kac [41] . Next, the same 
chaoticity property is proved with large generality (on f) in [17]. Theorem ^.10 is a 
"quantified" version of [17, Theorems 4 & 9] and [41, paragraph 5]. 

Proof of Theorem 4.10. As in Theorem 4.9, it is not obvious that F N is well 
defined under our assumption on / which is not necessarily continuous, since we are 
restricting f® N to a surface of WL N . But the argument given at the beginning of the proof 
of Theorem 4.9 shows in fact that the restriction of to /C5/v is unambiguously defined. 
Since, Theorem 4.9 implies that Zj^(f, y/~N) is finite and non zero for N large enough, we 
deduce that F N is well defined for N large enough. 
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Let us fix t > 1 and N > I + 1. Denoting V = [Ve,Ve : N), with Ve = (vj)x<j<£, 
Vz,n = (^j)^+i<j<Af ; we write thanks to the equality (iii) of Lemma 4.2 

F N (dV) = (Lj (Ve) ^j-^ f^j (V e>N ) a N -*>V"^ m>N ) af{V t ) dV e , 

so that, coming back to the notation V = Vc = {vj)\<j<t. E R , we have 



if we define the quantity 0n,£ by 



i.J^^UV^T) v. 



(4.18) Nl (y):=(2w)2e^- JV ~^ V ' 1 af (V). 

The key point is now to prove that On/ goes to 1. Recalling the Stirling formula T(k) 
% (f )* (1 + Oik- 1 )), we write of as 



N-l-2 

\ q N-l-\\ /AT _ |T/|2^ — 

^ l^J - | C JV-1| --JV-2 
Pi | iV 2 



N~2a N (N) (2vr) 

from which we deduce 



(V) Z _N^W^EFh *n-i(N-\V\*) n . 0( *n 

6nAV) - z> n (Vn) n-^n(n) Vi^d + o(-)) 



2 

£-|VK ' 



a N -dN-i)e +Q((jV-£)- 1 / 2 ) ^ 



(4.19) = |eW + 0((iV _ £)-V2) | (l + 0(-))l |v| ^ 



(9 2 



where we have successively used (4.10), (4.11) the definition of aN-e(N — I), and a calcu- 
lation yielding 

aN -f N - £) =1 + Q(f/N). 
N~2a N {N) 

It implies in particular the two following estimates on On t which will also be very useful 
in the proof of the next theorems 

C£ 2 IVI 4 

(4.20) OnM <Cl lv] ^, \0 N /{V) - 1| < + C^l |y| > 7Vl/8 . 
Once they are proven, the conclusion follows since from the second one 
\\Ff-f®% = \\(0 N ,e-l)f®% 
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It only remains to prove the estimates (4.20). The first uniform estimate in (4.20) 
is clear from 4.19 since ||#tv^||oo and 8\ t are also uniformly bounded. For the second 
estimate, we first control 



\0n,iW) - !| - \ Q N/i V ) ~ M 1 \v\<N 1 /* + \®N,t(V) ~ l \ l\v\>NV a 

I — lyp \ 2 it/i4 



< 



CI 2 , _ |F| 4 



) + 0(N- 1 / 2 )\l lvl < Nl/ s + C^l 



- ^1/2 1 |V|<ATV8 + C-^ I yjl| y |> Jv i/ 8 , 
which implies a similar bound for since 

1)2 

< c\e l Ni (v)-i\ + c 



< 



N 

ce , „ |y| 4 



+ 67 ^T72 1 |V|>A^V8. 



ATl/2 

This concludes the proof. □ 

Proof of (1.10) in Theorem 1.5. The proof of the two last estimates in (1.10) follows 
from Theorem 4.10 together with (2.18) and (2.19). □ 

4.3. Improved chaos for conditioned tensor products on the Kac's spheres. In 

this section, we aim to prove rate of chaoticity for stronger notions of chaos for the sequence 
(F N ) defined in the preceding section. Let us first recall the notion of entropy chaos and 
Fisher information chaos in the context of the "Kac's spheres" as they have been yet 
defined in the introduction. For / 6 P(E) smooth enough, we define the usual relative 
entropy and usual relative Fisher information 

f f |Vd 2 

H U\l) '■= I ulogu-fdv, J(/| 7 ) := / -ydv, u:=f/j, 

Je Je u 

and similarly for G N G P sym ()CSN), we define the (normalized) relative entropy and 
relative Fisher information 



1 [ JVi „ Ni N T(r<N\ N\ 1 f \^9 N \ 2 



where g N := stands for the Radon-Nikodym derivative of G N with respect to o~ N . 

Definition 4.12. We say that a sequence (G ) of P(lCSj^) is 

i) f -entropy chaotic if G 1 ^ f and 

H(G N \a N )^H(f\ 1 ), 

ii) f -Fisher information chaotic if G^ f and 

I(G N \a N ) -> I(/| 7 ). 

It is worth emphasizing again that our definition is slightly different (weaker) that the 
corrseponding definition in [17]. But they are in fact equivalent as we shall see in next 
section (Theorem 4.19). 
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Theorem 4.13. For any f £ Pq(M) L P (M), p > 1, satisfying the moment assump- 
tions (4.17) of Theorem 4-10, the corresponding conditioned product sequence of mea- 
sures (F N ) defined by (4.16) is f -entropy chaotic. More precisely, there exists C = 
C(p,\\f\\ LP ,M 6 (f)) such that 



(4.21) \H(F n \a n )-H(f\ 7 )\< 



N 



Proof of Theorem 4.13. With the notation F N := [/® ]ics N , we write for any N > 1 



f\r?N 1 



log L)F» --log Z' N {f). 

Thanks to the bound (4.10) on Z' N (f) which implies that (Z' N (f)) is bounded, we deduce 

H(F N \a N )= I F? (log£\ +0(1/N). 

Recalling the notation On := 9n,i defined in (4.18) and the estimates (4.20) it satisfies, 
we may then write 

H{F N \a N ) = H(Jh) + f {On ~ 1) / (log f A +0(1/N), 



with 



|r|<c / |^-i|/(i + H 2 )d«+ / \e N -\\f\\o g f\dv. 

Jr Jr 



=:Ti =:T 2 

In order to deal with T±, we use the second estimate of (4.20) and get 

In order to deal with T2, we make the more sophisticated (but standard) splitting: for any 
N,R,M > 1, we write 

T 2 < f |0jv-l|/|log/| + C fl / /| log/| 

< S wp\0 N -l\C f + C e f f(]ogf)+lf> M + C I /(log/) + l M >/>i 
b r Jb r Jb% 

+C e I /(Iog/)_l 1>f> H a +C e / /(Iog/)_l H 2 >f>0 . 
Jb^ - J - Js^ - J - 

For the second term, we write /(log/)+ < /( 1+ p)/ 2 < /P/M^ -1 ^ 2 on {/ > M}. For the 
third term, we write / (log/)+ < / logM < / (logM) \v\ 6 /R 6 on {/ < M, \v\ > R}. For 
the fourth term, we write log / > — |t;| 2 on {/ > exp(— |v| 2 )}, and thus /(log /)_ < / \v\ 2 < 
f |u| 6 /i? 4 on {1 > / > e - ' 1 '' , 1 17 1 > R}. For the last term, we write / (log/)_ < 4^/7 on 
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{0 < / < 1}, and thus /(log/)_ < 4e~H 2 /2 
T 2 < C f sup\9 N -1\+C 

Br 



on 



1 



{e"H 2 > / > 0, |v| > R}. We deduce 



+ 



(logM). 
R e 



< 



C(\\f\\ P ,M 6 (f)) 

ATl/2 



with the choice R = iV 1 / 8 (which allows to use the second estimate of (4.20)), and then 
M (p-l)/2 = R 6 □ 

Before stating a similar result with the Fisher information, we introduce a notation: 
the gradient on the Kac's spheres K,Sn will be denoted by V CT 

V®V\ V-VF(V). 



VvFiV) := P V ^VF(V) 



Id 



-I vf(v) = vf(v) - - — v. ' " v . 



\V\ 2 J ' ' v ' ' '~ ' N 

if F is a smooth function on R . Py± stands for the projection on the hyperplan perpen- 
dicular to V. We will use many times that 



(4.22) 



V 



h ' FT 



1 



IVY 



v 



\v\ 



1 



— P V ±VF -- = — V ff F — 



V 



V 



Theorem 4.14. For any f 6 Pg(lR), satisfying the moment assumptions (4.17) of The- 
orem 4-10, the corresponding conditioned product sequence of measures (F ) defined by 
(4.16) satisfies 

sup /(-F^laiv) < +oo 



if 1(f) < +oo. If moreover 



f(vf 
/(«) 



(v) 2 dt> < +oo, 



i/ie sequence F is Fisher information chaotic. 

Proof of Theorem 4.14. We only proof the second point. The first point (bound- 
edness of the Fisher information) can be deduced from the above proof. It suffices in fact 
to use the simple bound (V^GI < |VG| instead of equality (4.23). 

Remark also that the bound on the Fisher information implies that / is continuous 
and uniformly bounded since E = K. Therefore, the IP (for p > 1) assumption which 
is necessary in theorem 4.10 is implied by our bound on the Fisher information. We can 
therefore apply the estimates (4.20) on the quantity 0jv,i for i = 1,2 defined in (4.18). They 
imply in particular that ||0jv,i||oo is uniformly bounded and that 6n,% converges point-wise 
towards 1. We start with the formula 



I(F N \a 



1 

N 



|V ff In 



/ 



F N {dV). 



As V CT is the projection on the Kac's spheres of the usual gradient, we have from (4.22) 
for any function G on M. N 



(4.23) 

Using this with G = In 



V a G{V)\ = |VG(VY 2 -jj\V- VG(V)[ 



c<S)N 



in the Fisher information formula, it comes 



1 



V ■ Vln 



7' 



F N {dV). 
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Recalling that F± = f 9jm,i from (4.18), by symmetry, the first term in the right hand side 
is equal to 



(4.24) 



d v In 



f(v) 



7(u) 



Ffidv) = /(/| 7 ) + 



Vf(v) 



f(v) 



+ v 



(9 NA (v)-l)f(v)dv. 



The last term goes to zero from the hypothesis on /, the uniform bound |0jv,i| < C and 
the pointwise convergence of 9n,i to 1. To handle the second term in the RHS of (4.24), 
we compute 

v 2 



1 

w 



V ■ Vln 



N 2 ^ 



vi=l 
N 



In 



IVi 



-y 



i=l 



^ 1 ^ 



'v, 



^3 



(Vi) 



After integration, it comes thanks to the symmetry of F 



N 



1 

iV2 



V- Vln 







M 


[mil 


/R V 


. 7. 



(«)) <(cfo) 



+ 



N- 1 



Vl^2 



hi 



/ 



(«l) 



hi 



/ 



(v 2 )F 2 N (dv 1 ,dv 2 ). 



N J K 2 I 7 J- ' [ 7 

Using the uniform bound F^(v) = 0n,i{v) f(v) < Cf(v), and the hypothesis on /, we 
obtain that the first term of the r.h.s. is bounded by The second term denoted by 
R 2 (N) is equal to 

i? 2 (iV) 



iV- 1 



2 i 


V\V 2 






Li] 




R 2 


. 7_ 


. 7. 



+ 



N 
N-l 



N 

N-l 
N 



V\V 2 



7 



(vi) 



ml 

7 



(v 2 ) f(vi)f(v 2 ) dvidv 2 
i 

(v 2 )(9n,2(vi,v 2 ) - 1) f(vi)f(v 2 )dvidv 2 



(vf(v) + v 2 f(v)) dv + R 3 (N) = R 3 (N), 



after an integration by parts and because of the equality J v 2 f(dv) = 1. The term R 3 (N) 
goes to zero by dominated convergence since 



ViV 2 



In 



In 



{v 2 ) 



f(vi)f(v 2 )dvidv 2 ■- 
<I(fh) [ v 2 df. 



hi 



(v) 



f(v) dv 



This concludes the proof. 



□ 



4.4. Chaos for arbitrary sequence of probability measures on the Kac's spheres. 

In that last section, we aim to present the relationship between Kac's chaos, entropy chaos 
and Fisher information chaos in the Kac's spheres framework. 

We begin with a result which is the analogous for probability measures on the Kac's 
spheres to the lower semi continuity of the Entropy and Fisher information yet established 
on product spaces. 
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Theorem 4.15. For any sequence (G ) o/P(/C5at) such that — 1 Gj weakly in P(E^), 
there holds 

H{Gj\^ j ) < liminf H(G N \a N ), I(Gj\i® j ) < liminf I (G N \a N ). 

For the proof, we shall need the following integration by parts formula on the Kac' 
spheres, which proof is postponed to the end the proof of Theorem 4.15 . 

Lemma 4.16. Assume that F (resp. <E>j is a function (resp. vector field in M. N ) on 
the Kac's spheres ICSn with integrable gradient. Then the following integration by part 
formula holds 



(4.25) 



N - 1 

V a F(V) ■ $(V) + F{V) div, $(V) - -j^F(V)$(V) ■ V 



da N (V) = 



where div a stands for the divergence on the sphere, given by 

div CT $(V) := V V ff $ i(V) ■ <h = div $(V) - V ZJ { > Vi 

where the last formula is useful only if $ is defined on a neighborhood of the sphere. 

Proof of Theorem 4.15. We refer to [17, Theorem 17] for a proof of the inequality in- 
volving the entropy and we give only the proof of the second inequality, which in fact relies 
on the characterization 1^ of the Fisher information. Precisely, the previous Lemma 4.16 
can be used to get a reformulation of the Fisher information relative to a N on the sphere 



I N (G N \a N ) := [ \V a hiG N \ 2 G N (dV)= sup / (V In G N ■ $ — 

Jks n <S>eCl(R N ) N Jks n V 4 / 

(4.26) = sup I f^-lq>( V )-V-dw a <!>(V)- 1 -^^-) G N (dV). 

4>eCl(R N ) N JICSn V iV 4 / 

Next applying the equality (3.13) to the probability measure 7® J , we get that for or any 
e > 0, we can choose a <p G C£(R 3 ) 3 such that 

hj(F j \^ j ) - e < - [ (v-Vj- dwip - ^P) F j (dVj). 
J J Jw V 4 / 

Remark that the r.h.s. is quite similar to (4.26). With the notation N = nj + r, < r < j 
and Vn = (Vj,l, ■ ■ ■ , Vj,ni Vr), we define 

$(V N ) := {ip{V hl , . . .,^,„),0) € Cl(R N ) N , 

and use it in the equality (4.26). We get 

A/(G><*) > (^4(^).^-div„*(V N )-!i<Me)G'*( ( iV N ) 
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where 

if V(/j decrease sufficiently quickly at infinity. Passing to the limit, we get 

liminf I(G N \a N ) > - [ (<p-Vj- div^ - ^f] F^dVj) > I{F^\^) - e 

N->+oc J J Rj \ 4 J 

which concludes the proof. □ 

Proof of Lemma 4.16 As before, we will use the normalized norm \V\ 2 '■= \J jq^vj. 
Choosing any smooth function q on (0, +oo) with compact support, we define 

W {V):=q{\V\ 2 )F(ZA ' 



Its divergence is given by 



N \IV\ 2 J \\V\J \V\ 2 \V\ 2 \\V\ 2 J \\V\J \V\ 



\V\i \\V\2) \\V\: 
Integrating this equality, and using polar coordinate, we get 



= (^j [V a F(V)-<!>{V) + F(V)div a $(V)]a N (dV)j Qf q ( r )r N - 2 dr^j 
+^ (^j F(V) $(V) ■ Va N (dV)^j q'{r)r N - 1 dr 



Since q'(r)r N 1 dr = —(N — 1) J °° q(r)r N 2 dr, we obtain 



10 i v /' - V/' V Jo 

AT 1 

v o-r [v ) • wyv ) -\- r y v j aiv^ ) — — r yv ) • *ayv ) ■ v uc ' 

'KS N L N 

which is the claimed result. □ 



da N (V) = 0, 



The next theorem will be the key estimate in the proof of the variant of Theorem 1.4 
adapted to the Kac's spheres. It relies on the HWI inequality on the Kac's spheres, which 
allows to quantify the convergence of the relative entropy. 

Theorem 4.17. Consider (G N ) a sequence oJP()CSn) which is f -chaotic, f G P{E). 
Assume furthermore that 

M k {G N )-t <K fork>6, and I(G N \a N ) < K. 

Then f satisfies Mf~(f) < oc, 1(f) < 00, and (G ) is f -entropy chaotic. More precisely, 
there exists C\ := C\(K) and for any 72 < gfef a constant (72(72) such that 

\H(G N \a N ) - H(f\-y)\ < Ci (Wi(G N , f®")^ + C 2 N^ , 
with 71 := 1/2 - 1/k. 
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The proof uses the following estimate 

Theorem 4.18. ([18, Theorem 2], [5, Theorem 2]). For any sequence (G N ) ofP(JCS N ), 
there hold 

Vl<k<N, H(G% \a%) < 2H(G N \a N ) and I(G% |of) < 2I(G N \a N ). 
Proof of Theorem 4.17. Step 1. Thanks to Theorem 4.18, we have 

I(Gf |of) <2K. 

Using the strong convergence of to 7 stated in 4.2, we pass to the (inferior) limit and 
get 

J(/|7) <2K and then 1(f) <2K. 

Introducing the restriction F N = f® N /Z(yfN)a N of f m to K,S N defined in (4.16) and 
using point i) of Theorem 4.14, we get 

sup I(F N \a N ) < C 2 . 

N 

Step 2. Because the Ricci curvature of the metric space KSn is positive (it is K := (N — 

1)/N) we may use the HWI inequality in weak CD(K, 00) geodesic space (see [73, Theorem 
30.21]) which generalizes the standard HWI inequality (3.16) quoted in Proposition 3.8. 
However, we have to be careful, because it is now valid with W2 replaced by the MKW 
distance constructed with the geodesic distance on the sphere, and not with the distance 
induced by the square norm of M. N . Fortunately, both distances are equivalent, and if we 
add a constant \ in the right hand side, we can still write the HWI inequality with our 
usual distance W%. We then have 



H{F N \a N )-H{G N \a N ) < -\/ I(F N \a N )W 2 (F 1 \G 1 ' '), 



and 



so that 



H(G N \a N ) - H(F N \a N ) < | ^ 'l(G N \a N ) W 2 {F N , G N ), 



\H{F N \a N ) - H{G N \a N )\ < C 2 W 2 {F N ', G N ). 
We rewrite it under the form 

\H(G N \a N )-H(f\j)\ < C 3 [W 2 (G N J^ N ) + W 2 (F N J^ N )]+\H(F N \a N )-H(f 1^)1. 
For the first term, we have using inequality of Lemma 2.2 

W 2 {G N ,f m ) < 4i^iy 1 (G 7V ,/® JV ) 1 / 2 -Vfc. 

For the second term, we have for any e > 

w 2 (f n j® n ) < AKn^-j) 1 / 2 - 1 ^ 

/ M 1 xl/2-l/fc 

< AK (Ooo(F ; /) + C7 e iV~2+ £ +2/ fc 



< C £ (n 2 (F N ;f)^TT7k +C £ N * 



1/2-1/k 

)2+£+l/fc _j_ (J^ J\j 2 + e + 2/fc 1 

l/4-l/2fc 
< C £ N , 

where we have successively used Lemma 2.2, the inequality (2.18), (2.19) and Theorem 4.10 
in the case d = 1 (and then d' = m&x(d, 2) = 2). The third and last term is bounded by 
CN- 1 / 2 thanks to Theorem 4.13. □ 
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The lower semi continuity properties of Theorem 4.15 and Theorem 4.17 allow us to 
give a variant of Theorem 1.4 in the framework of probability measures with support on 
the Kac's spheres. 

Theorem 4.19. Consider (G ) a sequence of"P sym (JCSN) such that M§{G^) is bounded 
and Gi — / weakly in P(M). 

In the list of assertions below, each one implies the assertion which follows: 

(i) (G N ) is f -Fisher information chaotic, i.e. I(G N \a N ) — > /(/I7), /(/) < 00; 

(ii) (G N ) is f -Kac's chaotic and I(G N \a N ) is bounded; 

(Hi) (G N ) is f -entropy chaotic, that is H(G N \a N ) -> H(f\-/), H(f) < 00; 
(iv) (G N ) is f -Kac's chaotic. 

Proof of Theorem 4.19. The proof is very similar to the one of Theorem 1.4. i) 44> ii) 
and Hi) 44> iv) relies on the l.s.c. properties of Theorem 4.15. And ii) 4=> Hi) uses 
Theorem 4.17. We omit the details. □ 

We finally conclude this section with the proof of Theorem 1.6. 

Proof of Theorem 1.6. We only deal with the case j = 1, but the general case j > 1 
can be managed in a very similar way because we already know that G 1 ^ — f®i weakly 
in P(£ ,J ) thanks to Theorem 4.17 and Theorem 4.19. With the notations of Theorem 1.6, 
we have to prove 

H{G^\f)= f log(Gf//)Gf as N -+ 00. 

JE 

First, we observe that since G N is symmetric and has support on the Kac's spheres, 
M 2 (G N ) = 1. Moreover, 

/(GfK) = / |VlogGf - Vlogaf| 2 Gf 

JE 

= /(Gf)+ / [2Alogaf + |Vlo g( 7fT]Gf, 

JE 

so that 

/(Gf) < J(Gf|af)+ /(2Alog< + |Vlogaf| 2 )_Gf. 

JE 

We easily compute 

2Alog(rf + |Vlogcjf | 2 = 

N-3 f o (2v) 2 /N 2 n 2/N (2v/N) 2 \ 



and then 



[l-v 2 /N) 2 (l-v 2 /N) (l-v 2 /Nf 



( 2Aiog^ + |viog.f| 2 )_ = * N - H \fl" 2 J N f ^m 



< 



1 32 



(1 - 1/4) 2 9 



Thanks to the boundedness assumption (1.12) we get that I(G^) < G for some constant 
G G (0,oo), and then /(Gf | 7 ) < 2[J(Gf ) + M 2 (Gf )] < G. 
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Next, we introduce the splitting 

H(G N \f) = tf(Gf| 7 )-#(/| 7 )+ /(/-Gf)log^ 
v v ' Je 7 

=:Ti v v ' 

= :T 2 

and we show that Tj — > for any z = 1,2. For the first term T±, using twice the HWI 
inequality we have 

|Ti| < (^I(Gf| 7 ) + V / ?Cm)) W 2 (G? , /) 

because of the uniform bound on the Fisher information and of the convergence property 
W 2 {Gi , f) — > 0. That last convergence is a consequence of [72, Theorem 7.2 (iii) =4> (i)], 
Gj^ — / weakly when N — > oo and (Gi,v 2 ) = (f,v 2 ) for any N > 1 when k = 2, and 
it is a is a consequence of [72, Theorem 7.2 (ii) =>• (i)], Gf^f weakly as TV -> oo and 
M fe (Gf ) < C for any TV > 1 when fc > 2. 

Before dealing with the last term, we remark that the bound on the Fisher information 
of / implies some regularity, precisely that y/J and then / are ^-Holder. Therefore In - is 
continuous and satisfies from the assumption (1.13) the bound 

< In HZ |U + a\v\ k ' + |/3| + y < C{v)™< k '' 2 \ 

We then conclude that T 2 — > by using [72, Theorem 7.2 (iii) =>• (iv)] when k = 2 and 
[72, Theorem 7.2 (ii) (iv)] when k > 2. □ 
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5. On mixtures according to De Finetti, Hewitt and Savage 

In this section we develop a quantitative and qualitative approach concerning the se- 
quence of probability measures of P sym (E N ), E C M. d , in the general framework of con- 
vergence to "mixture of probability measures" (here we do not assume chaos property). 

Depending on the result, we will need some hypothesis on the set E that we will make 
precise in each statement. While in the first and second sections the results hold with 
great generality only assuming that 

- E is a Borel set of R d ; 

we shall assume in the third and fourth sections that 

- E = M. d or E is an open set of M. d with smooth boundary in order that the strong 
maximum principle and the Hopf lemma hold (that we furthermore assume to be bounded 
in the third section); 

and we shall also assume in the fourth section that 

- the normalized non relative HWI inequality (3.15) holds in E (e.g. it satisfies the 
assumptions of Proposition 3.8). 

5.1. The De Finetti, Hewitt and Savage theorem and weak convergence in 

P(E N ). We begin by recalling the famous De Finetti, Hewitt and Savage theorem [24, 40] 
for which we state a quantified version that is maybe new. 

Theorem 5.1. Assume E C M. d is a Borel set. Consider a sequence (7r J ) of symmetric 
and compatible probability measures ofP(E^), that is 7T- 7 G P S j /m (S : ') and (fti)^ = ir e 
for any 1 < i < j, and consider (-fr- 7 ) the associated sequence of empirical distribution 
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in P(P(E)) defined according to (2.7). For any s > i, the sequence (tt^) is a Cauchy 
sequence for the distance W^-a , and precisely 

(5-1) [W H - S (* N ^ M )] 2 < 2 H^lloo(^ + ^) 5 

where <& s is the function introduced in Lemma 2.9. In particular, the sequence (tt j ) con- 
verges towards some it G P(P(i?)) with the speed Wjj-sffi ,tt) < ^L. The limit tt is 
characterized by the relations 

(5.2) Vj>l, ^'=7^:=/ p^TT(dp) in P sym (E j ), 

Jp{E) 

or in other words, with the notations of section 2.1 



(5.3) V^eC^) (vr J ,^)=/ R v {p)K{dp). 

JP(E) 

Reciprocally, for any mixture of probability measures tt G P(P(E)), the sequence (ttj) of 
probability measures in P(£ ,J ) defined by the second identity in (5.2) is such that the ttj 
are symmetric and compatible. 

Proof of Theorem 5.1. We split the proof into two steps. 

Step 2. In order to estimate the distance between tt n and tt m we shall use as in the proof 
of Proposition 2.10 the fact that || • ||^-- s is a polynomial on P(E), but we have to choose a 
good transference plan. Fortunately, their is at least one simple choice. The compatibility 
and symmetry conditions on (ir N ) tell us that tt n+m is an admissible transference between 
71"^ and tt m . Using the symmetry of tt n+m and the isometry between (E /&n,Wi) and 
(Pn(E),Wi) stated in step 1 in the proof of Proposition 2.14, we will interpret it as 
a transference plan 7r N+M on Vn{E) x Vm{E) between tt n and ir . More precisely, 
jr N+M g P(P(.E) x P(-E)) is defined as the probability measure satisfying 

V$ G C b (P{E) x P(E)) (tt n+m ,$)= [ <5>{p%,p¥)Tr N+M {dX,dY). 

JE N xE M 

With that transference plane we have 

[W H - s (* N ,* M tf< [ \\p-n\\ 2 H-.* N+M (d P ,dr l ) 

JP(E)xP(E) 

< f ( [ <S> s {x-y)[{p® 2 -p®rj)- 

JP{E)xP{E) \JR 2d 

+ (rf 2 - V ® P))(dx, dy)) t N+M (dp, dr,), 
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with the help of (2.25). We can then compute 
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< 



+ 



<S> s (x - y) \(p N x f 2 fiy] (dx, dy)\ n N+M (dX, dY) 

d>,(x - y) [{^r 2 p»](dx, dy)) n N + M (dX, dY) 



< 



/ N M \ 

J JV2 E " *i) "mE *.0* " * N+M (dX,dY) 

\ i>i=l »,J''=1 / 



< 



$ fl (0) iV-1 



A/ 



A/ 



$ s (x - y)ir 2 (dx,dy) - / s (x - y) it 2 (dx , dy) 



, ^(0) | M-l 



M M 
and we conclude with 



$ s (x - y)ir 2 (dx,dy) - / $ s (x - y) n 2 (dx,dy) 



(x - y)7r 2 (dx,dy) 



[W H -s(^,n M )] 2 < (^ + ^(*.(0)-|* ( 



< 211$ 



1 1 



The existence of the limit 7r is due to the completeness of P(P(i?)). 



Step 2. Now it remains to characterize the limit it. We fix j £ N, we denote by it* its 
j-th marginal defined thanks to the second identity in (5.2) and by itj = (it )•/ the j-th 
marginal of the empirical probability measure 7r as defined in (2.9). We easily compute 



1^ »r II 2 



P(E) 



= inf 
nen^.Tr) 

< inf 



P ^Tt N (d P ) - / p® j TT(d P ) 



P(E) 



IP(E) 

y»3 -rj^]Ji(dp,dri) 



H- 



nen(fi N ,ir) Jp(e) 



Next we fix s > 4-j so that using Sobolev embeddings on W d , \\(f\\oo < C|Mlff s f° r 
any </? £ H S (W ), which implies by duality that < C||p||tv for any p € P(R J ' d ). 

Using the Grunbaum lemma 2.8 and the compatibility assumption -k^ = it 3 , we get the 
inequality 

\\^3 \\N -Aril n\\«rN Zr N \\ s 

\W - *j \\ H - = iFj - TTj \\ H -s < C \\7Tj - TTj \\ TV < — • 
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Combining the two previous inequalities leads to 

ii 7 n / ii i -at ii i ii-JV n ^ C i 

IF 7 -^j\\H— < W -Kj \\H— + -nj\\H— < ~^ + ~' 



which implies the claimed equality in the limit N — > +00. □ 

Let us now introduce some definitions. For k > 0, we define 

P k (P(E)) := {it G P(P(£)); M k (ir) := M^t) < 00} 
and for k, a > 0, we define 

BP h , a (E N ) := {F G P(£7*); M^) < a}. 

Definition 5.2. For given sequences (F )n of P sym (E N ), (iT n ) n of P(P(E)) and it G 
P(P(E)), we say that 

- (F N ) is bounded in P k (E N ) if there exists a > such that M k (F^) < a; 

- (jr n ) is bounded in P k (P(E)) if there exists a > such that M&(7r n> i) < a; 

- (F ) weakly converges to it in P k (E^)\/ ~, we write F it weakly in P k {E^)\j j, if 
(F N ) is bounded in P k (E N ) and Fj — ttj weakly in P{E J ) for any j > 1; 

- {TT n ) weakly converges to tt in P k (P(E)) if (ir n ) is bounded in P k (P(E)) and ir n —± tt 
weakly in P(P(E)). 

With that (not conventional) definitions, any bounded sequence in P k (P(E)) is weakly 
compact in P k (P(E)), and for any sequence (F N ) of probability measures of P sym (E N ) 
which is bounded in P k (E N ), k > 0, there exists a subsequence (F ) and a mixture of 
probability measures tt G Pj.(P(E)) such that F N ' tt in P(E^)\/j. 

We now present a result about the equivalence of convergences for sequence of P sym (E N ), 
N — > 00, without any chaos hypothesis. 

Theorem 5.3. Assume E C M. d is a Borel set. 

(1) Consider (F N ) a sequence of P sym (E N ) and tt G P(P(E)). The three following 
assertions are equivalent: 

(i) F N — tt in P(£ ,J )vj, that is Fj ttj weakly in P(E^) for any j > 1; 

(ii) F N — ^ tt weakly in P(P(E)); 

(iii) Wx(F n ,tt n )^Q. 

(2) For any 7 G [^7,-37) (recall that d! = m&x(d,2)), and any k > -i_ d i > 1> ^ere 
exists a constant C = C{j,d,k) such that the following estimate holds 

(5.4) ViV>l \Wi(F N , 7Tjv) — Wi(F N , tt)\ < ° . 

(3) With the same notations as in the second point, we have for any mixture of probability 
measures a, {3 G P(P(E)) 

/ x , ^ CM k {ai) l / k 

(5.5) (<%,«)< -JT~' 

where a.j is empirical probability distribution in P(P(E)) associated to the j-th marginal 
aj G P(E J ), as well as 

(5.6) mfeffl - gCj^Mt+j^ft)*) < H , l(Qj>ft) < Wl(Q>/3) . 
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Proof of Theorem 5.3. Step 1. Equivalence between (i) and (ii) is classical. Let us 
just sketch the proof. For any ip G Cb(E J ) we have from the Grunbaum lemma recalled in 
Lemma 2.8 that 

{F N , R v ) = (F N ,ip^l^-^) + 0(j 2 /N) 
= (F?,<p)+0(f/N). 

We deduce that the convergence (F^,^) — > [it, R^) is equivalent to the convergence 
(Fj*,(p) — > (irj,ip) since that (tt, R^) = (irj,(p) thanks to Theorem 5.1. 

Therefore, i) is equivalent to the convergence (F , $) — > ("/r, for any polynomial 
function $ G Cb(P{E)). But now, the family of probability measures F N (and ir) belongs 
to the compact subset of P(P(£')) 

K := {a G P(P(J5)), s.t. a x = Fx}, 

and also any converging subsequence F N ' should converge weakly towards a probability 
measure tt having the same marginals as tt. Since by Theorem 5.1 marginals uniquely 
characterize a probability measure on ~P(P(E)), it implies tt = tt and then weak conver- 
gence against polynomial function implies the standard weak convergence of probability 
measures ii). 

It is classical that the MKW distance is a metrization of the weak convergence of measures. 
Even in that "abstract" case, (ii) is equivalent to Wi(F N ,tt) — > (recall that the distance 
chosen in order to define Wi is bounded). Thus, for sequences having a bounded moment 
M^F^) for some k > 0, the equivalence between (ii) and (iii) will be a consequence of 
(5.4). For sequences for which no moment is bounded, the same conclusion is true. 
The correct argument still relies on a version of inequality (5.4), with a slower and less 
explicit rate of convergence, which can be obtained from an adaptation of Lemma 2.1. 

Step 2. We now prove (5.4). For ttn we have the following representation: 
(5.7) 7t N = I P ® N Tx{dp) = [ fP^ir(dp). 



Thanks to Proposition 2.14, we may compute 

IWtC^TTJv) - W 1 (F N ,7T)\ = \W 1 (F N ,7t N )-Wl(F N ,7T)\ 

< W 1 (7T N ,7t)=W 1 [ [ p®"TT(d P ), [ 5 p n(dp) 

IP(E) JP(E) 



< / m(p® N ,S p )Tr(dp)= noo(p)7T(dp), 
JP(E) V 7 JP(E) 



< 



.V J F(E) - x- 



where we have successively used the triangular inequality for the Wi distance, the relation 
(5.7), the convexity property of the W\ distance and the definition of the chaos measure 
fioo. We also used the bound (2.30) and the Jensen inequality (recall that 1/k G (0, 1]) in 
the last line. 

Step 3. We now prove the third point. For the first inequality, choose s = ^- — Then 
by our assumptions, s > max(l, |) and we can apply Lemma 2.3 on the comparison of 
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distances in P(P(E)) and Theorem 5.1 to get 

i 2fc C Mh(a.-\ \ k 
Wi(&j,a) < C M k ( ai )k Wjj-^a^ajW < ^ 11 

For the first part of the second inequality (5.6) we write 

m (a, p) < m (a, &j )+wi(& j} $, ) + m {p 3 - . P) , 

we use the inequality just proved above and the identity (2.14). The second part of the 
second inequality (5.6) is a mere application of Lemma 2.7. □ 

5.2. Level-3 Boltzmann entropy functional for mixtures. In this section we recover 
some well known results on the Boltzmann entropy for mixture of probability measures 
as stated in [2] and proved by Robinson and Ruelle in [64]. However our proof differs 
from the one of [64] , and in particular it does not use the abstract representation result of 
Choquet and Meyer [22] but an abstract Lemma 5.6 that we introduce for our purposes. 

Let us assume that E C M. d is a Borel set and let us fix a real number m > 0. Then, 
for any ir £ P m (P(E)) we define 

(5.8) H(tt) := / H(p)n(dp), 

Jp(E) 

where H is the Boltzmann's entropy defined on P m (E). 

Theorem 5.4. (1) The functional % : P m (P(E)) -flu {oo} is proper, affine and l.s.c. 
with respect to the weak convergence in P m (P (£?)). Moreover, for any ir E P m (P (E)), 
there holds 

(5.9) H(ir) = sup H(TTj) = lim Hfa), 

where ttj is the j-th marginal of ir defined in Theorem 5. 1 and H is the normalized Boltz- 
mann's entropy defined on P jn (£ ,J ) for any j > 1. 

(2) Consider (F ) a sequence of P sym (E N ) and ir G P m (P(E)) such that F N tt 
weakly in P^E^^j. Then 

(5.10) H(tt) < liminf H(F N ). 

The proof of Theorem 5.4 uses the two following lemmas. 
Lemma 5.5. For any it £ P m (P(E)) we define 

H'(tt) := sup H(iTj). 

The functional %' : P m (P(E)) — > R U {oo} is affine, proper, and l.s.c. for the weak 
convergence, and 

(5.11) H'(ir) = lim H(w s ). 

The proof of Lemma 5.5 is classical. For the sake of completeness we nevertheless 
present it. 

PROOF of Lemma 5.5. Thanks to (3.1), for any j > 1, we have 



H{-Kj) > logc m - / H m (i7ri 
Je 
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so that %' is proper on P m (P(E)). It is also l.s.c. as the supremum of l.s.c. functions, 
since Hj is l.s.c. on P m (E 3 ) as it has been recalled, in Lemma 3.1 and since the inequality 
of the right of (5.6) shows that it h-» ttj is also continuous for the weak convergence of 
measures. 

As a second step, we establish (5.11). For any fixed I > 1 and any j > £ we introduce 
the Euclidean decomposition j = n£ + r, < r < £ — 1, and a direct iterative application 
of inequality (3.7) together with (3.1) imply 

Hj(jTj) > n He(iT£) + H r (ir r ) 

> nH t (*t) + (j - 1) [(lo gCm )_ - Mm(TC)]. 

We deduce that for any £ > 1 

Tl 

liminf H(iTj) > liminf — H^(tt() = H(ng), 

j— >oo " j—^oo j 

from which (5.11) follows. 

We conclude by establishing the affine property of %' . Let us consider F,G G P m (P(E)) 
and G (0, 1), and let us assume that H(Fj) < oo, H(Gj) < oo for any j > 1, the case 
when H{Fj) = oo or H{Gj) = oo being trivial. Using that s i— > logs is an increasing 
function and that sh>s log s is a convex function, we have 

ff(0 Fj + (i-0)Gj) = - ( (9 Fj + (1-9) Gj) \og{6 Fj + (l-6) Gj) 

> - [ {8 Fj log(0 Fj) + (1-8) Gj log((l - 8) Gj)} 

J J Ei 

= 8H(F j ) + (l-8)H(G J ) + I[0l og + (l-0) log(l-0)] 

> H(8 Fj + (1 - 8) Gj) + ^[8 log8 + (1 - 8) log(l-0)]. 

Passing to the limit j — > oo in the two preceding inequalities and using (5.11), we get 

U'(0F + (1-6)G) >8H'(F) + (1-8)U'(G) > H'(8F + (1 - 0) G), 

which is nothing but the announced affine property. □ 

We establish now in the following abstract lemma the last argument which allows us to 
prove the first equality in (5.9) and which will be useful in the next section in order to get 
the same property for the similar functionals on P m (P(E)) built starting from the Fisher 
information. 

Lemma 5.6. Consider a sequence (Kj) of functionals on ~P m (E-i) , in ^ 7 such that 

(i) Kj : P^E 3 ) — > R U {+00} is convex, proper and l.s.c. for the weak convergence 
of measures on P m (E 3 ) for any j > 1. Moreover, either m = and Kj is positive 
for each j, or m > and there exists k G (0, m), a constant Ck G M + such that 
the functional 

P(E j ) -> RU {00}, G 1 y K 3 (G)+j[C k + M k (G)\ 

is nonnegative and is l.s.c with respect to the weak convergence in P(E). 

(ii) j- l Kj{f®3) = K x (f) for all f G P m (E) and j > 1. 

(iii) Kj(G) > K e (Ge) + K r (G r ) for any G G P(E j ) and any £, r such that j = l + r. 
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(iv) The functional K' : P m (P(£)) -> R U {+00} de/med /or any vr G P m (P(E)) oy 
f £/tis a pari 0/ i/ie theorem that the sup equals the lim j 

/C'(7r) := sup - Kj(iTj) = lim - Kjfa), 
j>\ 3 j 

where ttj denotes j-th marginal defined thanks to Theorem 5.1, is affine in the 
following sense. For any probability measure n G P m (P(E)) and any partition 
partition of ~P m (E) by some sets uj^, 1 < i < M , such that uj{ is an open set in 
E\{uj\ U . . . U Wj-i) for any 1 < i < M — 1, ujm = P m (^)\(wi U . . . U 0Jm-i) an d 
ir{uJi) > for any 1 < i < M, defining 

on := 7r(wi) and 7* := — I^tt G P m (P(E)) 

so that 

7r = ai 7 1 + ... + ocm 1 M an d a\ + ... + au = 1; 

i/iere ZioZcfe 

/C'(7r) = ai K'fr 1 ) + . . . + olm /C'(7 M )- 
Then under the above assumptions, for any ir G P m (P(E)), there holds 



/C'(7r) = /C(vr) := / Kx{p)i:{dp). 

The functional /C : ~P m (P(E)) ->KU {00} is affine, proper and l.s.c. with respect to the 
weak convergence in P m (P (£?)). 

Moreover, it satisfies the following T-l.s.c. property. For any sequence F N ofP sym (E N ) 
and 7T G ~P(P(E)) such that F N — ir weakly in P m (E 3 )\/j, i/ien 

(5.12) 1C(tt) < liminf K(F 

N^roo 



PROOF of Lemma 5.6. We split the proof into five steps. 

Step 1. A fist inequality K> fC' We skip the proof that the lim equals the sup in point 
iv). This is a consequence of the hypothesis Hi) - and the bound by below in point i) in 
the case m > - and has already been proved in the proof of Lemma 5.5 for the entropy. 
We fix 7r G P m (P (£?)). Thanks to assumptions (i) and (ii), we easily compute 

/C(tt) = / -K 3 (p^)7T(dp) 

JP(E) 3 

> -kJ[ p^7r(dp))=-K j (n j ). 
3 y Jp(E) ' 3 

Taking the supremum over j in this inequality, we get a first inequality 

ZC(tt) > sup-Kjfa) = /C'(vr). 
j>l 3 

Step 2. J is l.s.c. on P m (E) with respect to the Wi-metric. 

We consider the case when m > 0, and choose k G (0, m) such that (i) holds. We 
explain in the step 3' below the necessary adaptation to do in the case m = 0. 

For any 5 > 0, by compactness, we can find a family of finite cardinal N of balls 
Bi := B(pi,5) = {p G ~P m (E); W\(p,pi) < 5}, p. t G BP mi y S , of radius 5 so that 

JV 

BP m ,i/s c (J Bi. 

i=l 
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We associate to that partition and "almost" partition of unity by 

Xp, 

25 



, . . Wi(p,pi) -] ct>i{p) 

<Pi[P) := 2 1 — , 9i{p) := 



Finally, we set for any p G P m (E) 

JV 



J s (p) J* 
i=i 

where 

Jf:= inf J(p) and J(p) := K x {p) + C k + M k (p). 

peB(pi,2S) 

We claim that by construction the functional J s is Lipschitz with respect to the W\ metric 
on P(-E), and satisfies 

(5.13) Vp G P m (E), lgI ^ (p) , mf J(p') < J s (p) < J(p), 

1 + p'eB(p,4<5) 

where 1 denote the indicator function. To obtain both inequalities, we introduce I s (p) := 
{1 < i < M, Wi(p,pf) < 25}, and rewrite 



J 5 (p):= UP)4- 

i£l s (p) 

But for any i such that W\{p, pi) < 25 we have 

inf J(p') <J-= inf J(p') < J(p). 

p'eB(piAS) p'<EB( Pi ,25) 

The upper bound in (5.13) follows form the second inequality (on the right). Since J(p) > 
by hypothesis (i), the first above inequality implies that 

J 5 (P) > ( V UP)) mf J(p') > ^= l ^ p) inf J x ( p '). 

The bound by below in (5.13) then follows because any p G B~P mt i/$ is at least in one 

of the Bi for some i, and then Y^j=\4>j{p) > <fii(p) > 1- The inequalities (5.13) and the 
hypothesis that J is l.s.c. with respect to the weak convergence on ~P(E) implies that 

(5.14) Vp G P m (E), lim J s (p) = J(p). 

We can now introduce the functionals J & and J defined for all tt G P m (P(E)) by 
J s (ir):= [ J\p)<dp) 

J(ir) := / J(p) Tr(dp) = /C(vr) + C k + M k (n). 

Since J s is Lipschitz with respect to the Wi-metric, the Kantorovich-Rubinstein duality 
theorem [72, Theorem 1.14] implies that the functionals J & is continuous with respect to 
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the Wi-metric. Moreover, the upper bound in (5.13) implies that < *7(tt), for any 

7r G P m (P (E)). Finally, an application of Fatou's Lemma together with (5.14) implies 

liminf J s (ir) = liminf / J 5 (p) n(dp) > [ liminf J 5 (p) yr(dp) 



All in all, we get that 



> / J{p)*{dp)=J{p). 
JPm(E) 



Vvr G P m (P(£)), J(tt) = su Pl 7 5 (vr), 

<5>0 



and that implies that J is l.s.c. with respect to the Wi-metric since the J & are continuous 
with respect to that metric. 

Step 2'. A necessary adaptation in the case m = 0. In that case, things are in some sense 
simpler since the functional K is already positive, so that we may try directly to apply 
Step 2 with J = K\. However, there is one difficulty : the compact sets BP mt i/$ does not 
covers P(E); even if we take their union for 5 > and m > 0. 

However, we can still do a correct proof if we fix tt at the beginning. We then choose a 
increasing function g : 1R + — > M + such that 

(5.15) lim g(v) = +oo and Mghri) := / g({v)) iri(dv) < oo. 

v^+oo J E 

Then we can restrict ourselves to the set P 9 := {p G P(E), M g (p) < +oo}, since the last 
hypothesis on g implies that ir(P g (E)) = 1. If we now replace in step 2, the sets i3P m l /5 
by the still compact sets 

BP g:1/s :={p, M^Kd- 1 }, 

and follow the same strategy, we will conclude that /C(vr) = sup^ >0 fC s (tt) were the K, 5 will 
be continuous with respect to the Wi-metric. It implies that K, is l.s.c. at tt. Since tt is 
arbitrary, K, is globally l.s.c. 

Step 3. K, is l.s.c. with respect to the weak convergence of measures on P m (P(E)). 

In the case m = 0, that step is useless since in step 2' we proved that K, = J is l.s.c. So 
it remains only to treat the case m > 0. Since J = K. + M.^ + is l.s.c. with respect to 
the Wi-metric on P m (P(-E')), th.6 conclusion will follows if we show that .A/f^ is continuous 
with respect to the weak convergence on P m (P(^/)), defined in Definition 5.2. 

For this, we choose p, p G P m (E). Since 

Vv,v' £E \{v) k -{v') k \ < k min(l, \v - v'\) [{v) k + (t/) fe ), 

we obtain if we chose an optimal transference plan 7r (for the distance d,E on E) between 
p and p 



so that 



\M k (p) - M k {p)\ < J \(v) k - (v') k \ n(dv,dv') 

<k J d E (v,v')({v) k + (v') k )TT(dv,dv') 

1— — 

(f m i \ m JL 

/ d E (v,v')™- * n(dv,dv') j (M m {p) + Af m (/i))™ 

\M k (p) -M k (p)\<k (M m (p) + M m {p))™ W x {p, p) 1 -^, 
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where we have used Holder inequality and the fact that d,E < 1- Choosing now two 
a,/3 G P m (P(£')) and an optimal transference plan ir (for the distance W\ on ~P(E)) 
between them, we get 

\M k (a)-MM\ = J M k {p) - M k (p')Tr(dp,dp') 

<k I {M m (p) + M m (p'))™ W 1 (p,p') 1 -^TT(dp,dp'), 



and then 



\M k {a) - MM < A;(X m (a) + M m (/3))^Wi(a,/3) 1 -i, 



where we have used Holder inequality. This concludes the step since weak convergence on 
P m (P(.E)) exactly means that W\ goes to zero and the moment of order m are bounded. 

Step 4. Proof of the remaining inequality K' > K. Because ~P m {E) endowed with the 
MKW distance W\ is a Polish space, for any fixed e > 0, we can cover it by a countable 
union of balls B n := B(f n ,e) of radius e. For a given it G P m {P{E)), we can choose M 
such that 

ujm := P m {E)\{B\ U . . . U Bm-i) satisfies tt{u)m) < £ 
and denote Wj := Bi\(B\ U . . . U for all 1 < i < TV — 1. We define then 

1 M f 

ai : = vr(wi), 7i := — tt\ u ., tt M := V"aj5 i, 7i = / PI 1 {dp). 

a i ~, Jp(E) 



For any 1 < i < M, we have 

1 



/C'(V) :=sup-i^( 7 ])>ifi(7i)- 
i>i 

Using the affine property (ro) of /C', the above inequality and the definitions of tt n and 
/C, we get 

/C'(7r) = a 1 /C'( 7 1 ) + - + aM^(7 M ) 
(5.16) /C'(7r) > a 1 K 1 (7 1 1 ) + ... + a MJ Fs:i(7f)=A:(7r M ). 

We observe that because 7r^ = 71!, we have 

{7T^,\v\ m ) = ^iM m ) = M m {7r)<oo, 

and in particular tt ai G P m (P(#)). Moreover, defining T M : P(£) -> {7 1 ,...,7 M } by 
T M (p) = 7* for any p G Wj, we have 7r M = (T A/ )jj7r and then 

Wi(vr,7r M ) < {(id®T M )^,W x {.,.)) <2e. 

We consider now a sequence e — > and the corresponding sequence (vr M ) for which we 
then have by construction weakly in P m (P(i?)). Inequality (5.16), the above 

convergence and the l.s.c. property of K, proved in step 2 and 3 imply the second (and 
reverse) inequality 

/C(tt) < liminf /C(vr M ) < /C'(tt). 

M->oo 

Step 5. The T-J.s.c. property of /C. We give the proof only in the case m > 0, the case 
m = being simpler. We consider {F^) a sequence of P S ym (E^) and vr G P(P(£)) such 
that F N — 1 7r weakly in P m {E^)\/j, in particular M m (F^) < a for some a G (0,oo). For 
any fixed j > 1, using the l.s.c. property of introducing the Euclidean decomposition 
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N = nj + r,0<r<j — 1 and using iteratively the inequality {Hi) of the hypothesis as 
in the proof of Lemma 5.5 as well as the lower bound on K r provided by hypothesis (i), 
we get 

-Kjfc) < liminf -K ? (Ff) 

j J J N->oo j J 3 

< liminf —{K N (F N ) -rK r {F r N )} 

AT-s-oo 71] 

< liminf{— K N {F N ) + -{C m + a)} 



N^-oo nj n 
liminf N~ X K N (F 

JV->oo 



We deduce (5.10) thanks to (5.9). That concludes the proof. □ 

Proof of Theorem 5.4. The proof is just an application of the two previously proved 
lemmas. First, let us observe that Hj, T~L and %' fulfill the assumptions of Lemma 5.6 
since that (i) is nothing but (3.5), (ii) is a consequence of Lemma 3.1, {Hi) is nothing but 
(3.7), and a stronger version of {iv) has been established in Lemma 5.5. Then (5.9) and 
(5.10) are exactly the conclusion of Lemma 5.6 applied to the entropy. □ 

5.3. Level-3 Fisher information for mixtures. We state now a similar result for the 
Fisher information for mixtures of probability measures. 

Let us assume that E = M, d or E is an open connected and bounded set of M rf with 
smooth boundary. Then, for any tt G ~P(P{E)) we define 



(5.17) Z(tt):= / I{p)ir{dp), 

Jp(E) 

where / is the Fisher information defined on ~P{E). 

Theorem 5.7. (1) The functional X : ~P(P{E)) — > 1R U {oo} is affine, nonnegative and 
l.s.c. for the weak convergence. Moreover, for any tt G P(P{E)), there holds 

(5.18) l{ir) = sup I(tTj) = lim I{nj), 

where I stands for the normalized Fisher information defined in P(£ ,J ) for any j > 1. 

- sym 



2) Consider {F N ) a sequence of~P sym {E N ) and tt G P(P(£')) such that F N — tt weakly 



in P(£ ,J )vj. Then 

(5.19) J(?r) < liminf I{F N ). 

As for Theorem 5.4, the proof of Theorem 5.7 relies on the abstract lemma 5.6. The 
hypothesis of that lemma are proved to be true in the lemma 5.10 below. Two useful 
intermediate results are stated in the next two lemmas. 

Lemma 5.8. There exist : 

- a family of regularizing operators St : P(-E) — > P(-S) defined for any t > 0, 

- a family {Ct) of positive constants 

- a family St of positive constants such that e% — )■ when t — > 

- for any k > 0, a family {e' kt ) of positive constants so that e' kt — > when t — > 
such that for any p G P(^) and any t > 0, denoting pt := St{p) we have 

(5.20) I{ Pt )<I{p), M k { Pt ) <2 k {M k {p) + e kt ), ||Vln^||oo < C t 
and Wi{p,p t ) < e t . 
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Proof of Lemma 5.8. We only consider the case E = M d . The case when £ is a 
smooth bounded open set can be handled similarly by using for (pt) the solution of the 
heat equation (with Neumann boundary conditions) and the strong maximum principle. 
We define 

m(z):=%- <z/t> = %~'^ mS and p t := Vt *p. 

Observing that 

\Vrtt(z)\ = I \z\_ < 1 
r) t (z) t (z) ~ t' 

we deduce that for any x £ M. d , we have 

If 1 

|V/9 t (x)| < - / r) t (x -y)p(y)dy = - p t . 

The inequality on the moment of order k is a consequence of the inequality 

(x + y) k <2 k ((x) k + (y) k ), 

which leads to the claimed inequality with eut = Mk(rjt) = t k M)~{r]\). 

As pt is also an average of translations of p (which has the same Fisher information as 
p), the convexity of the Fisher information implies that 

I(p t ) = l(J p(.-z) Vt (dz) \ < J l(p(--z)) Vt (dz) = I(p). 

We finally observe that for any p € ~P(E) there holds 

W\{p,pt) = Wi(p,p* i] t ) < / \z\r) t (z)dz = C d t, 



and that proves the last estimate. □ 

Lemma 5.9. Consider tt £ P(P(E)) and define the regularized family lit £ P(P(-E)) ; for 
t > 0, by push-forward by St, itt '■= StJtir or equivalently 

(7r t ,*) = (7r,$ t > V$eC t (P(£)) 

where &t S Cb(P(E)) is defined by &t(p) '■= &(pt) and pt is the defined in Lemma 5.8. 
Also denote by ntj S P(E :1 ) the j-th marginal of Tit defined thanks to Theorem 5.1. For 
any t > and any X 3 := [x\, ...,Xj) G E 3 there holds 

(5.21) IVilnTr^X- 7 )) < C t . 

Proof of Lemma 5.9. Thanks to Lemma 5.8, we write 

I Vi7r ti (X j ) I JVipt(xi)pf : '~ 1 (x2,.. .Xj)7r(dp) 



j 1 



7T tj (X 3 ) 7T tj (X 

- 1 ~ Ct > 

which is nothing but (5.21). □ 
Lemma 5.10. For any tt £ P(P(-E)) we define 

I' (it) := sup /(vrj). 

jeN* 
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The functional I' : ~P(P(E)) ->IU {00} is nonnegative, l.s.c. for the weak convergence, 
satisfies 

(5.22) X'(tt) = lim /(tt,-) 

and is affine in the same sense as formulated in point (iv) of Lemma 5.6. 

Proof of Lemma 5.10. The fact that 1' is nonnegative and l.s.c. is clear and (5.22) 
comes from the monotony property J(7r,-_i) < I(^j), Vj > 2 established in Lemma 3.7 
(i). It remains only to prove the linearity property of 1'. For the sake of simplicity we 
only consider the case when M = 2 and uj\ is a ball. The case when wi is a general open 
set can be handled in a similar way and the case when M > 3 can be deduced by an 
iterative argument. For some given tt G P(P(E)) which is not a Dirac mass, f\ G P(E) 
and r £ (0, 00) so that 

6 := tt(B t ) G (0,1), £ r := B(f x ,r) = {p, Wifo/i) < r}, 



we define 



so that 



F := -l Br 7T, G := Y^I^tt 



F,G £P(P(£)) and tt = 6F + (1 - 
and we have to prove that 

(5.23) Z'(7r) = 61' (F) + (1 - 6)1' (G). 

We split the proof of that claim in four steps. 

Step 1. Approximation and estimation of the affinity defect. As explained for tt in the 
statement of Lemma 5.9, we define Ft and Gt to be the push-forward of the measures F 
and G by the regularisation operator St, and then Ftj and Gtj are their projections on 
P (£•?') 

Ftj- I p m F t {dp)=! pf 3 F(dp), or (F tj ,<p)=[ R^p)F t (dp), 

via duality, for any ip G Cb(E^) where is the polynomial on P(E) associated to (p 
thanks to (2.8). The same holds for G. We also remark that these two above operations 
(regularisation and projection on E 3 ) commute if we define the regularisation operators St 
on E J by the convolution with nf 3 . It is worth emphasizing that we do not need here, in 
order to define these objects, that F and G are probability measures, but only that they 
are Radon measures on P(E). 

For any given j G N, we define 



A tj := 6 1(F tj ) + (1-6) I(G tJ ) - 1(6 F tj + (l-6) G tj ), 

a f ' VF *jl 2 1 n .1 f ' VG *jl 2 [ Ki-QWGtj + evFt 

JF t j { 'J G tj J (l-9)G tj + eF tj 



After reduction to the same denominator, and some simplification, we end up with 



A tj = 6(1-6) J 



G t j F t j 



Vl ln^ 



(1 - 6)G tj + 6F tj 

* 20(i - 0) /(i^W(i viin ^i 2+ i viin ^i 2 ) 
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We can estimate the r.h.s. term thanks to Lemma 5.9 by 

GtjFtj 



A tj < 40(1 - 6)C t J — 



9)Gtj + 0F t j 

Step 2. Disjunction of the supports. Let us introduce for any s E (0, r) the two measures 
on P(E) (which are not necessarily probability measures) 

F' := l Bs F = il Bs 7r, F" := l BABa F, so that F' + F" = F 



and let us observe that 



lim f F"(dp) = lim / l BrXBa (p)F(dp) = 0, 



by Lebesgue's dominated convergence theorem. For any t > and j > 1 there holds 
F[j + F[j = F t j with F"- > 0, so that we may write for any e > 

M > 49(1 - 9)C '/(T^W 4 '' C '/^ 
5 4 * (1 -* )C '/ (T^W e ' 

taking s close enough to r, and this independently of j and i because 

*3 = / if = / 

£J JP(E) JP(E) 

Step 3. Concentration. We introduce the real numbers u = and 5 = depending 
on e, as well as the set 

B u := {Xi = (xi, . . . , xf) , , h) <u}c EP 

which is nothing but the reciprocal image of the ball B u C ~P(E) by the empirical measure 
map. Using that 

Gt i F tj < 1 r -, | 1 pi -i 

(1 _ 0) Gy + ftf* - 0^ L B U + 1 _ B *tj ^fic, 

we get 

(5.24) < 4C t ^(1 -0)J_ G tj + 9 F^j + e. 

If p belongs to the support of F' and X ] £ i?^, we have thanks to the last estimate in 
Lemma 5.9 

> u-s-C d t > 5/2, 

for any i G [0,T(e)], T(e) > 0. We first assume that ir £ P m (P(E)) for some m > 0, 
which implies also that F,G £ P m (P (E)). Gathering this information with the Chebychev 
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inequality, estimate (2.30) and estimate (5.20), we conclude that 



i%. = / (pf 3 ,l §a )F'(dp) 

'Be Jp {E ) 

< -J (f W^ xj ,p t )pT{dXi))F>{dp) 

JP(E) \JEi J 

(5.25) < [ M m (p t ) 1/m F'(dp) < (M m {F) + StS^ , 

with 7 := \/(d + 2 + d/m). With exactly the same arguments, we prove that for any e > 
and any t E [0, T(e)] 

(5.26) / G tj <^-M m (G) l / m + e. 

JB U OJ 1 

Gathering (5.24) with (5.25) and (5.26), we get that for any e > 0, t E (0,T(e)] and 
3 > 1, 

4aM m (^)V- 

" ^ + ' 

and then for any e > 0, t G (0, T(e)] 

(5.27) limsupAy <3e. 

j-s>oo 

Step 3'. Adaptation for ir ^ P m (P(£^)). In the case when tt ^ P m (P(£')) whatever is 
m > 0, we can still prove that (5.27) holds true for any e and t E (0, T(e)) where T(e) is 
small enough. Remark that it cannot be the case if E is a smooth bounded open set, so 
we have only to deal with the case E = M. d here. 

The idea is the same as in the proof of Lemma 5.6. We choose a function g : IR+ — > R_|_ 
satisfying (5.15) together with g{2x) < 2g(x) for all x > 0. We argue by using moment 
with respect to g({-)) rather than to (-) m . The property g(2x) < 2g(x) ensures that the 
estimate on the moments in Lemma 5.8 is still true with the moment M g . 

Next, for any R > 0, we introduce the mapping from Pr : M. d — > M. d defined by 



Pr{x) :-- 



if Ixl < R 



Rrh else 



using the concentration estimatge (2.30) for the probability (po P R 1 )^ N = (Pji^p) <s>N , it 
is still possible to deduce that 

C n , 2M g ( Pt ) 



J^W 1 (v j XJ , P t)p? 3 (dXi)<-^R + 



Summing up with respect to tt, choosing R large enough and letting j — > +oo, we get the 
claimed inequality for the limsup (with maybe a Ae in place of the 3e) . 
Step 4. Conclusion. The regularization by convolution (or with the heat flow) implies 
that for any a E ~P(P(E)) 

I (at) = sup I(atj) = sup I (ajt) < sup I(aj) = 1(a). 
r i j<i .r i 

Moreover, the last point in Lemma 5.8, implies that at — ^ a with respect to the Wi-metric. 
Thanks to the previous inequality and the l.s.c. property of X', we obtain 

(5.28) I'(a) = liml'(at). 
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Turning back to the definition of Atj, the estimate (5.27) and the above properties, we 
obtain for any e > 0, t G (0, T(e)} 

X'(7r) > An) 

> ei'(F t ) + (i-e)i'(G t )-3s. 

First passing to the limit t — > and using (5.28) we get 

X'(tt) > flZ'(F) + (1 - 6)X'(G) - 3e, 

for any e > 0, which concludes the proof of (5.23) since the reverse inequality is just a 
consequence of the convexity of the functional I'. □ 

Proof of Theorem 5.7. We only have to observe that Ij, X and X 1 fulfil the assump- 
tions of Lemma 5.6. But the assumption (i) is a consequence of Lemma 3.6, the assump- 
tion (ii) is a consequence of Lemma 3.5, the assumption (in) is proved in Lemma 3.6 
and assumption (iv) in Lemma 5.10. Then (5.18) and (5.19) are exactly the conclusion of 
Lemma 5.6 adapted to the Fisher information. □ 

Proposition 5.11. Consider ir G P(P(E)) and (ttj) the associated family of compatible 
and symmetric probability measures in P(E 3 ) defined as in the De Finetti, Hewitt & Savage 
theorem. For any p £ [1, +00], the following equality holds 



1 

(5.29) it — Suppess {\\p\\ p , p £ P(E)} = sup\\-7Tj\\p = lim 1 1 vr ^- 1 1 ^ - 

It is part of the result that the limit exists. In particular, it implies the equivalence 
Y? G N, \\TTj\\ L p(EJ) < C j k ~ Suppess {\\p\\p, p £ P(E)} < C. 

Proof of Proposition 5.11. First remark that there is nothing to prove for p = 1 since 
we are dealing with probability measures. Now, one inequality is a simple consequence of 
the De Finetti, Hewitt & Savage theorem. In fact, using the definition of ttj, we get 

P ® j 7T{d P ) ' ' 



< / \\pW\\ p K(dp)= / \\ P \\ 3 p «(dp), 

JP(E) JP(E) 



and the last quantity is clearly bounded by M- 7 , M := ir — Suppess {||/o|| p , p £ P(E)}. 

For the reverse inequality, we denote by q £ (l,+oo] the real conjugate to p. Because 
L q (E) = (L P (E))' , the Hahn-Banach separation theorem infers that for any A < M there 
exists / in the unit ball of L q (E) so that the set 

B := {p £ P(E) s.t. J f(x)p(dx) > A} 

is of 7r-measure positive : 5 := j B it(dp) > 0. Now for any j £ N 

IN| p >/ f^dK^t ([ f® j P ®AdTT(p)>5\l, 
JEi JP(E) \JEi J 

1 

which implies the reserve inequality M < limj_j. +00 ||7Tj||p . □ 
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5.4. Strong version of De Finetti, Hewitt and Savage theorem and strong con- 
vergence in P(E N ). We begin that section by an HWI inequality valid on P(P (E)), 
which is just a "summation" of the usual one and will be very useful in the sequel. 

Proposition 5.12. Assume E = M. d or more generally that (3.15) holds for N = 1. For 
any a,/3 £ P(P2(E)), we have 



(5.30) H(a) <H(l3) + C E Vz(a)W2{a,P). 

As a consequence, the entropy T~L is continuous on bounded sets relatively to X. In more 
precise words, if (ir n ) is a bounded sequence ofP m (P(E)), m > 0, such that 

tt u it weakly in ~P(P(E)) and I(ir n ) < C, 

then H^n) -^■H(tt). 

Proof of proposition 5.12. A first way in order to prove (5.30) is just to pass in 
the limit in the HWI inequality (3.15) for and and use the inequality stated in 
lemma 2.7 for the quadratic cost, and the result of the previous section about level 3 
entropy and Fisher information 5.9 et 5.18. 

Another possibility is to sum up the HWI inequality (3.16) for p 6 P(E). Choosing an 
optimal transference plan n for W2 between a and /3, we have 

/ H(p)U(dp,d V ) < [ H( v )U(dp,d V )+ [ y/I(pjW 2 (p,r l )U(dp,dr l ), 

JP(E) JP(E) JP(E) 

so that 

1 

\ 2 




H(a) < H(P) + I / I(p) H{dp, drj) I [/ W 2 (p, r,) 2 U(dp, drf)) , 

IP(E) J \JP(E) J 

thanks to Cauchy-Schwarz inequality. It leads to the desired inequality. 

The second point is obtained by two applications of the previous inequality, leading to 

y/l(ir n )) W2(vr n ,7r), 



and then using the l.s.c. property of the level 3 Fisher information in order to prove that 
X(-7r) < 00. We conclude by remarking that the RHS converges to as n tends to 00. □ 

The results of the preceding section and the HWI inequality make possible to compare 
different senses of convergence for sequences of P{E N ), N 00, without any assumption 
of chaos. 

Theorem 5.13. Assume E = R d or E C R d is a bounded connected open subset with 
smooth boundary and that (3.15) holds. Consider (F N ) a sequence of P sym (E N ) and 
it G P(P(i?)) such that F N vr weakly in P k (E^) yj , k> 2. 

(1) In the list of assertions below, each assertion implies the one which follows: 

(i) I(F N ) — » T(tt), X(vr) < 00; 

(ii) I{F N ) is bounded; 

(Hi) H(F N ) -> H(tt), H(ir) < 00. 

(2) More precisely, the following version of the implication (ii) => (Hi) holds. There exists 
a numerical constant C such that for any k > 2 and K > 0, and for any any sequence 
(F N ) ofP sym {E N ) satisfying 

\/N M k (Ff)<K k , I{F N )<K 2 , 
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there holds 

(5.31) ViV>4 M \H(F N ) -n(ir)\ < KW 2 {F N ,ir N ) + CK d ' l ^j^p , 
with 7 := k(i + 2d~)+4d-2 an d as usua l d' = max(2, d). 

(3) In particular, for any sequence (ttj) of symmetric and compatible probability measures 
ofP(E :) ) satisfying 

M k (m)<K k , Vj>l Ifa) < # 2 , 

there holds 

(5.32) Vj > 4 M , iHfa) - H(n)\ < CK d ' l -^l 

for the same value of 7. In other words, (5.32) gives a rate of convergence for the limit 
(5.9). 

The fact that the constant C does not depend on k is interesting when the space 
E is compact or the measures F N have strong integr ability properties, for instance an 
exponential moment. It allows to choose large k and get almost the largest exponent 7 
possible. Precise versions of the point (hi) are stated (without proofs) in the corollary 
below. 



Corollary 5.14. (i) In the case where E is compact, we denote K := max(diam(£'), ^I(tt)). 
Then there holds for all j > 4 2d 

(5.33) \H{ir j )-H(ir)\<CK d ' ] ^^ with-y- 1 



2d+ 1 



(ii) If Mp^xi^l) '■= Je e x ^ f: \\{dx) < +00 for some A > andl(ir) < +00, there exists 
a constant C(d, {3, \,I(tt)) such that for j large enough ( > C hxMp t \(TT\)) 

n jn+d'/fi 1 
(5.34) \H(TTj) -H(ir)\< C [ -^-. with 7 - 



p 1 2d + l 

Proof of Theorem 5.13. We split the proof into four steps. 

Step 1. i) implies ii) is clear. For ii) implies iii), we use the HWI inequality (3.15) and 



we write 



\H(F n )-U(it)\ = \H(F n )-H{k n ) + H{k n )-U(it)\ 

< C E Ul{F N ) + y/l(ir N j) W 2 (F N ,7r N ) + \H(tt n ) - H(n) 



We know from (5.9) that Hift) = lim H(itn) and from (5.18) and (5.19) that 

I(ttjv) < 2T(tt) < lim inf I(F N ) < K, from which we conclude that there exist a sequence 

e 7T (N) -> such that 

\H(F N ) - H(ir)\ <2C E K W 2 (F N ,ir N ) + e(JV). 

We now aim to estimate e(N) more explicitly as claimed in point (3). Then (2) will be 
a direct consequence of (3) and the above estimate. 

From now on, we only consider the case E = R rf since the general case is similar (and 
the case when E is compact is even simpler). 
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Step 2. From [12, Theorem A.l] we know that for any R, 5 > we may cover P(Br) 
by J\f(R,6/2) balls of radius 5/2 in W\ distance (which is less accurate than the one 
considered in the above quoted result) with 

M(R,5) < [-^-) 

where the constant C[ and C' 2 are numerical. Let us fix a > 1 and recall that we de- 
fine BP kta (E) := {p G P(E) s.t. M k (p) < a}. Next, for any p G BP k , a (E), we define 
Pr G P(-Br) by pr = p(Br)^ 1 pls R for R large enough (so that it defines a probability 
measure), and we observe that for any / G P(E) we have 

and that for any R such that R k > 2a 

Wi(p R ,p) < \\pr-p\\tv< l --TB-\ +P( B R) 

since then p(B R ) > 1 — > |. 

As a consequence, for any 5 < 1 and a > 1, choosing i? such that 3a/R k = 5/2 in the 
two preceding estimates, we may cover BP k , a (E) by A/" a (<5) = M(R, 5/2) balls of radius 5 
in Wi distance, with 



~ 5 < M a {5) < (daU- 1 -^' 



l \ C2 a fc S 

The above lower bound on M a (5) is straightforwardly obtained by considering balls cen- 
tered on Dirac masses distributed on a line. In the sequel, we shall often use the shortcut 
M = N a (8). Let us then introduce a covering family u)\ C BP kta {E), l<i< M a {5), such 
that 

Afa(S) 

sup Wiip, V ) < 25, 4 n J 3 = if i± j, BP kA {E) = Q 
as well as the masses and centers of mass 



4, 



We also denote := [BP k: a{E)] c and Oq := J^stt, so that X^=o a f = Denoting 

2 := = 1, ...,jV (<y); a£ > A/" a (5)~ 2 }, we finally define 

A^a(5) 5 
vr 5 := V /?f <L 6 , with $ := — ^ A if i G Z and $ := if i 4 Z. 

Remark that by our moment assumption 

6, f , . f M k (p) M k fa) 



Since X^£i>i Q f — A/" -1 < ^ we necessarily have Z 7^ if 5 + Affc ^ 7ri - < | < 1, an 
assumption that we will make in the sequel. We fix now the value of a to be so that 

, Mfc(7Tl) 
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As we shall see that will lead to the optimal inequality. With that particular choice, the 
condition above simply writes S < 4, and the upper bound on N may be rewritten 

/ , 2 xc 2 i^«r d ( 1+ £) 

(5.35) W(£) '=Na($) < [C\K 8 k\ 
In that case, we have 

(5.36) ]T a]<25, 1 > E a] > 1 - 2 <J > I 

jez,j>o jez 

Now, by convexity of the Fisher information 



!(//)< - s [ I(pMdp), 



which in turns implies that 

1(tt s ) = £ Pfltf*) < ^ / fyMdp) < 21(n). 

i=i ^jez a j ieZ J (4 

Similarly, for the moment of order k : 

MM) = E P & i M *U!) < ^ sH / M k (p)7r(dp) < 2M k (m). 

i=i a o iez Jw l 

In order to prove (5.32), we introduce the splitting 

(5.37) \H^j)-U^)\ < IH^-H^)} 

+|#(ttJ) - H(tt s )\ + \n(TT S ) - H{ir)\, 

where we have wr itten vr| := (n 5 )^ We now estimate each term separately. 

Step 3. On the one hand, defining T s : P(E) -»• {f$, ffr}, T 8 (p) = ff if p e uf, 
T s (p) = /q = <5o if p G Wq and /3q := 0, we compute 

Af A/" A/" 

Wi(tt, tt 5 ) < Wi (vr, E of ) + W (E «f E $ 



i=0 i=0 i=l 

A" 

< / ^(^^(/d^r^tivr + llE^f-^)^ 

'p(£)xP(£) 11 

/ C 



i=0 



TV 



[ W x (p, T 5 (p)) Tr(dp) + E l«f " /3f I + l«ol 

< 5 + ^1 + 65 <85, 

a 

where we have used several times estimation (5.36), in particular in order to get the 
inequality 



Ei^-/ ? fi + «o=fi-^- 7 )E«^E«B3E 

i=i V l^iez i J ieZ i^z i^z 



at 
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Using lemma 2.3 and the bound on Mki^i), we obtain a bound on YV 2 (ir s ,ir) as follows 
(we recall that the constant C that appears is numerical : C = 2 3 / 2 ) 

W 2 (7t s 1 tt) < CM k (TT 1 ) 1 / k Wi(n s ,Tr) 1 / 2 - 1 / k < ACK5 1/2 - 1/k . 

Now, we use the HWI inequality on P(-E) stated in Proposition 5.12 and we bound the 
first term in (5.37) by 



\U(ir d ) -HM\ < 



and the third term in (5.37) very similarly 



\HM) -Hfa)\ < 



< 



W 2 (it s ,tt) < 2KW 2 (tt s ,7t) 



W 2 (it 5 ,it) < 2KW 2 (tt 5 ,it) 



where we have used the properties (5.18) of the level 3 Fisher information and Lemma 2.7 
in order to bound W 2 by W 2 . All together, we have proved 

(5.38) \n(ir 5 ) - H(n)\ + \H{rf) - < C K 2 5 1 /2-i/fc ) 

for some numerical constant C < 2 6 . 

Step 4. We estimate the second term in (5.37). Using that vr| = fi[ (/f 
we write 



1 

3 Je> 

M 



4 tog 7^ 



U 3 J EH 



5, (Pi air j fa(fbr j 



7T 



7T: 



with A : {U = (m) eRf, £V u i = l } ~+ M defined by 

A(U) := m log + ... + uj\f log f ^ 

Observing that A is in fact (the opposite of ) a discrete relative entropy, we have for any 
U G with J2i u i = 1 

- log(AA 2 ) < log(min/3f) < A(t/) < 0, 

we deduce 



\H(iTj) —H(ir )\ < - logJ\f a (5). 
J 3 



Step 5. All in all, observing that thanks to (5.35) 

log AT (5) < CK d <T d ( 1+ i) [1 + lnK -ln<5], 

we have 



We can now (almost) optimize by choosing 5 = j~ r , with r _1 := i — i + d(l + |) we 



obtain 



C" 1 liZ-fo) - W(?r)| < i^ max ( 2 ' d ) 



ln(ifj) 
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for the integers j > 4 1//r so that the the condition 5 < \ is fulfilled (in order to ensures 
that Z 7^ 0). But it can be checked that for k 6 [2, +oo), d < - < 2d. So that the previous 
condition on j is fulfilled for j > 4? d . 

□ 
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